75 KiB
PulseOx Monitor - Project State & Technical Documentation
Current Version: v3.57 (True Raw Frame Saving)
Last Updated: November 17, 2025
Status: Production - threshold after preprocessing, timestamp stays colored
⚠️ IMPORTANT: Update VERSION constant in pulseox-monitor.go and this document with every build!
For AI Assistant: You MUST increment the version number in both pulseox-monitor.go and this document whenever you make ANY code changes. This is not optional - it's a critical part of the workflow.
🔧 For New Chat Sessions: Data Access
You have full filesystem access to /Users/johanjongsma/pulse-monitor/
Use these tools to work with the project:
Filesystem:read_file- Read source code, logs, config filesFilesystem:write_file- Create new filesFilesystem:edit_file- Modify existing code (preferred for edits)Filesystem:list_directory- See what files existFilesystem:move_file- Move/rename files (e.g., approving training digits)Filesystem:search_files- Find files by pattern
Key files you'll need:
- Source:
pulseox-monitor.go,ocr.go,layout_detection.go, etc. - Config:
config.yaml - Training:
training_digits/*.png,training_digits/invalid/*.png - Output:
review/*.png,review/review.html - Logs:
pulseox-monitor_*.log
DO NOT waste time asking how to access files - just use the filesystem tools directly!
Project Overview
This system monitors a Masimo Rad-G pulse oximeter by reading its display via RTSP camera feed. It uses computer vision (OpenCV/gocv) to perform OCR on the SpO2 and heart rate values, then posts readings to Home Assistant for tracking and alerting.
Why This Exists
The Masimo pulse oximeter doesn't have a data output port, so we use a camera pointed at its display. The display has significant overexposure and light leakage issues, making traditional OCR (Tesseract) unreliable. We developed a custom template-matching approach that handles these challenging conditions.
Architecture
Main Components
-
pulseox-monitor.go - Main application (v3.10)
- Unified frame processing loop
- Interruptible sleep for graceful shutdown
- Smart escalation strategy
- Signal handling (Ctrl+C)
-
processor.go - Frame processor
- OCR execution
- Result validation
- Home Assistant posting
-
validators.go - Validation handlers
- Corruption detection
- Confidence checking
- Stability validation (hindsight)
-
ocr.go - Recognition engine
- Template matching for digits 0-9
- Invalid pattern detection (corruption)
hasOneAt()function for detecting narrow '1' digits- Dynamic width calculation for 2-digit vs 3-digit displays
-
layout_detection.go - Display locator
- Finds SpO2 and HR display areas in frame
- Position-based detection (rightmost edge method)
- Handles digit fragmentation
-
normalize.go - Frame normalization
- Screen width detection
- Scaling to 860px width
- Layout detection with normalization
-
frame_source.go - Frame sources
- RTSP streaming source
- Single-file test source
- Interruptible via
IsActive()method
-
types.go - Shared data structures
- ProcessingState with ConsecutiveFailures counter
- Reading, ProcessingResult types
- Status and Action enums
-
homeassistant.go - Data posting
- REST API integration
- Posts only when values change
- Separate sensors:
sensor.pulse_ox_spo2,sensor.pulse_ox_hr
-
html_report.go - Review interface
- Dark-themed HTML generated on shutdown
- Shows all processed frames with confidence scores
- Visual indicators for unstable readings
-
timestamp_ocr.go - Timestamp validation
- Extracts camera timestamp from frame
- Uses Tesseract OCR
- Validates against server time
- Warns if lag >10 seconds
Data Flow
RTSP Stream / File → Frame Capture (every 4th frame) →
Timestamp validation (every 10th frame) on **COLORED** frame →
**Clone frame for error saving** (grayscale → threshold at 240 → save as rawThresholded) →
Preprocessing (crop 68px top, rotate 90° CW) on **COLORED** frame →
**THRESHOLD ONCE at 240** (grayscale → binary) AFTER preprocessing →
Layout detection (find contours on binary) →
Display area extraction (280x200px binary regions) →
Template matching OCR on binary →
Digit validation (8 & 0) →
Exception handling (saves rawThresholded on errors) →
Home Assistant posting
Recognition System
Template Matching Approach
We use a library of digit images extracted from the actual pulse oximeter display. Each digit (0-9) can have multiple template variants to handle slight variations.
Current templates:
- Digits 0-9: Multiple variants per digit
- Invalid patterns: 2 templates in
training_digits/invalid/white_blob_1.pnginvalid_2.png
Template location: /Users/johanjongsma/pulse-monitor/training_digits/
Digit Width Detection
The system handles both 2-digit (XX) and 3-digit (1XX) displays dynamically:
- Narrow '1' detection:
hasOneAt(x)checks if column X is 80%+ white and column X+10 is 80%+ black - Check positions:
- Digit 3 (rightmost):
w-8pixels from right edge - Digit 2: Depends on digit 3 width (72px for '1', 100px otherwise)
- Digit 1: Depends on both digit 2 and 3 widths
- Digit 3 (rightmost):
Why -8 offset for digit 3? Originally used -5, but testing showed '1' digits needed detection slightly more to the left to catch the right edge reliably.
Display Positioning
- SpO2 and HR displays: 234 pixels apart vertically
- Layout detection: Run once at startup, locked unless re-detection triggered
- Frame processing: Every 4th frame (~3.75 fps from 15fps stream)
- Timestamp check: Every 10th processed frame (~2.5 seconds)
Exception Handling System
The system has three independent exception handlers that run in sequence:
1. Confirmed Corruption (Priority: Highest)
Trigger: Any digit recognized as -1 (matched invalid template with >70% confidence)
Behavior:
- Log to file (silent to console)
- Skip corrupted frame
- Continue to next frame through normal processing
- No special retry, no layout re-detection
Why this approach: Corruption is a recognition failure, not a camera/lighting issue. The next frame will naturally be clean.
Invalid templates:
Stored in training_digits/invalid/ - patterns like white blobs, UI elements, or corrupted digits that should never be interpreted as valid numbers.
2. Low OCR Confidence
Trigger: Average confidence < 85% for SpO2 or HR, OR negative values (unrecognized digits)
Behavior:
-
First occurrence:
- Log "⚠️ Low confidence, retrying with next frame..." or "[UNRECOGNIZED] SpO2=X, HR=Y"
- Read next frame immediately
- Re-detect layout (handles camera movement/vibration)
- Process retry frame
- If successful (high confidence + values changed): perform stability check and post
-
Second consecutive low confidence:
- Log "⚠️ Low confidence after retry, pausing 2 seconds"
- Give up, wait 2 seconds before continuing
Why re-detect layout: Low confidence or unrecognized digits often indicate camera movement, lighting changes, focus issues, or temporary blur. Re-detecting layout helps recover from these transient problems.
3. Large Delta (Hindsight Validation)
Trigger: Values changed >3 points from last stable reading (e.g., SpO2: 95→92 or HR: 70→75)
Behavior:
- Hold new reading as "pending" (don't post yet)
- Store baseline (value before the spike)
- Wait for next reading
- If next reading confirms trend (moves in same direction):
- Post the held reading
- Post the confirming reading
- If next reading contradicts (moves opposite direction or returns to baseline):
- Discard held reading as glitch
- Post the new reading
Why hindsight validation: Physiological changes are gradual. Sudden spikes >3 points are usually recognition glitches or transient display artifacts. This prevents false alarms while allowing real trends.
Example scenario:
Baseline: SpO2=95
Reading 1: SpO2=91 (Δ4) → HOLD, don't post
Reading 2: SpO2=89 (Δ2 from held) → Trend confirmed, post 91 then 89
vs.
Baseline: SpO2=95
Reading 1: SpO2=91 (Δ4) → HOLD, don't post
Reading 2: SpO2=95 (back to baseline) → Discard 91 as glitch, post 95
Key Design Decisions
1. Change-Based Posting Only
Decision: Only post when SpO2 or HR values actually change.
Rationale:
- Reduces Home Assistant database bloat
- Avoids unnecessary network traffic
- Still captures all meaningful changes
2. Template Matching Over Tesseract OCR
Decision: Use custom template matching instead of Tesseract.
Rationale:
- Pulse oximeter display has severe overexposure
- Tesseract unreliable with bright backgrounds
- Template matching: 90%+ accuracy vs. Tesseract: ~60%
- We have stable digit shapes - perfect for template matching
3. Fixed Layout After Startup
Decision: Detect layout once, lock it, only re-detect on low confidence retry.
Rationale:
- Camera is stationary (mounted on tripod)
- Display position doesn't change
- Layout detection is expensive (~10-20ms)
- Re-detection only needed if camera moves/vibrates
4. No Immediate Corruption Retry
Decision: Don't immediately retry after detecting corruption; just skip to next frame.
Rationale:
- Corruption is a recognition issue, not camera issue
- Next frame (133ms later at 15fps) will naturally be different
- Avoids complexity of retry-within-retry scenarios
- Simpler code, same reliability
5. Processing Every 4th Frame
Decision: Process every 4th frame (~3.75 fps from 15fps stream).
Rationale:
- Balance between responsiveness and CPU usage
- SpO2/HR don't change faster than ~1 reading/second physiologically
- Allows time for processing, posting, and stability checks
- Adjusted for 15fps camera (was 6th frame for 30fps)
- Can be adjusted: 3rd frame = ~5fps, 5th frame = ~3fps
6. Dark Theme HTML Review
Decision: Dark background with light text in review HTML.
Rationale:
- Easier on eyes during long review sessions
- Better contrast for confidence scores
- Color-coded indicators more visible
7. No FPS Display
Decision: Don't display stream FPS statistics.
Rationale:
- Stream FPS is constant (30fps from camera)
- FPS logging adds noise to console output
- Processing rate (every 6th frame) is more relevant than raw stream FPS
- Keeps console focused on actual readings and warnings
8. Digit Validation for '8' and '0'
Decision: Validate recognized '8' and '0' digits by checking for characteristic holes.
Rationale:
- Template matching can falsely recognize corrupted/partial digits as '8' or '0'
- '8' validation: checks for TWO holes (at 30% and 70% Y, scanning 40-50% X)
- '0' validation: checks for ONE hole (at 50% Y, scanning 30-70% X)
- Wider scan range for '0' (30-70%) than '8' (40-50%) because '0' hole is larger
- If validation fails, digit marked as -1 (invalid) → triggers retry/re-detection
- Prevents false alarms from misrecognized digits
- Simple pixel-based validation is fast and reliable
9. Alarm Mode Decolorization
Decision: Pre-process colored alarm backgrounds before OCR to normalize them to black/white.
Rationale:
- Pulse oximeter displays colored backgrounds when in alarm mode (low SpO2 or abnormal HR)
- SpO2 alarm: Yellow/orange background, HR alarm: Cyan/blue background
- Normal thresholding fails with colored backgrounds
- Decolorization converts: white digits (RGB all > 240) → white (255), everything else → black (0)
- Applied ONLY to display regions AFTER extraction (in
recognizeDisplayArea()function) - Layout detection works on original colored frames - grayscale conversion handles colors fine for contour detection
- Decolorization timing:
- Layout detection: Original colored frame → grayscale → threshold → find contours (works fine)
- Extract display regions: Still colored (280x200px = 56K pixels)
- Decolorize extracted regions: Only ~100K pixels total (both displays)
- Continue with OCR: grayscale → threshold → template matching
- Each display region is ~280x200px = 56K pixels
- Decolorizing two display regions (~100K pixels) takes <10ms
- Threshold of 240 ensures only pure white pixels (digits/borders) are preserved, not grays
- No performance impact - simple pixel-wise operation on small regions
- Enables reliable digit recognition regardless of alarm state
- Critical for 24/7 monitoring where alarms are expected
Environmental Considerations
Light Leakage Issues
Problem: Ambient light creates false contours and affects brightness detection.
Solutions implemented:
- Increased brightness threshold to 170 (from lower values)
- More selective contour filtering
- Height filters to exclude oversized boxes (<200px)
Manual mitigation: Position towel/cloth to block light from reflecting off display surface.
Camera Positioning
Critical: Camera must be positioned to minimize glare and ensure consistent framing.
Current setup:
- Tripod-mounted camera
- RTSP stream from network camera
- Fixed zoom/focus (no autofocus)
Files and Directories
Source Files
pulseox-monitor.go- Main applicationprocessor.go- Frame processorvalidators.go- Validation handlersocr.go- Recognition enginelayout_detection.go- Display findernormalize.go- Frame normalizationframe_source.go- Frame sources (RTSP/file)types.go- Shared data structureshelpers.go- Utility functions (logf)homeassistant.go- API integrationhtml_report.go- Review page generatortimestamp_ocr.go- Timestamp validationconfig.go- Configuration loading
Configuration
config.yaml- Camera URL and Home Assistant settings
Training Data
training_digits/*.png- Valid digit templates (0-9)training_digits/invalid/*.png- Corruption patterns
Output
review/- Frame captures and digit images (cleaned each run)f{N}_boxes.jpg- Visualization of detected areasf{N}_{display}_checks.png- Digit detection analysisf{N}_{display}_digit[1-3].png- Individual digit cropsf{N}_{display}_full.png- Full display with recognized valuereview.html- Live-updated HTML (append-only)
raw_frames/- Raw processed frames for failed recognitions (cleaned each run)raw_YYYYMMDD-NNNNN.png- TRUE raw frames (unprocessed, portrait orientation, with timestamp)- Only saved when recognition fails (corruption, low confidence, unrecognized)
- Used for debugging and training data collection
- Can be tested directly:
./pulseox-monitor raw_frames/raw_*.png(will go through full preprocessing pipeline)
test_output/- Debug visualizations (cleaned each run)- Layout detection debug files
- Corruption debug snapshots with timestamps
Logs
pulse-monitor_YYYYMMDD_HHMMSS.log- Detailed execution log- Console shows only: value changes, warnings, errors
- File logs: all frames, timing, confidence scores, decisions
Development Workflow
Version Management
CRITICAL: Update version with every build!
For AI Assistant (Claude):
- ALWAYS increment version number when making ANY code changes
- Update BOTH pulse-monitor-new.go and PROJECT_STATE.md
- Add entry to Version History section documenting what changed
- This is MANDATORY, not optional
Version increment process:
- Open
pulseox-monitor.go - Update
const VERSION = "v3.XX"(increment version number) - Document what changed in version history section below
-
After significant changes:
- Update this PROJECT_STATE.md document
- Keep architecture, design decisions, and troubleshooting current
- Update "Last Verified Working" date at bottom
-
Commit discipline:
- Version bump = code change
- Every feature/fix gets a version increment
- Keep PROJECT_STATE.md in sync with code
Adding New Training Digits
When you find a good digit image in the review files:
# Example: Approve f768_hr_digit3.png as digit "1"
# Creates training_digits/1_2.png (next available variant)
Current naming: {digit}_{variant}.png (e.g., 7_1.png, 7_2.png)
Adding Invalid Patterns
When you find corrupted/glitchy images:
# Move to invalid folder with descriptive name
# Creates training_digits/invalid/invalid_X.png
Testing Changes
- Edit code on Mac
- Compile:
./build.sh - Run:
./pulseox-monitor - Monitor console for value changes and warnings
- Check log file for detailed diagnostics
- Review HTML on shutdown (Ctrl+C) for visual verification
Performance Tuning
Frame rate:
- Change
skipCountparameter in NewRTSPSource call (currently 4) - Located in pulse-monitor-new.go
- Values: 3=~5fps, 4=~3.75fps, 5=~3fps (for 15fps stream)
Confidence threshold:
- Currently 85% for high confidence
- Located in multiple places in main loop
- Lower = more false positives, Higher = more retries
Delta threshold:
- Currently 3 points for stability check
- Change in large delta handler
- Lower = stricter (more held readings), Higher = more lenient
Known Limitations
- Camera-dependent: Requires good camera positioning and lighting
- Display-specific: Templates are specific to Masimo Rad-G display fonts
- No real-time HTML: Review HTML only generated on shutdown
- Three-digit limitation: Assumes HR displays as 1XX when >99, may need adjustment for other scenarios
- Brightness sensitivity: Very bright ambient light can still cause issues
Future Enhancements (Planned)
Docker Deployment
Target: Ubuntu 22.04 server with Docker Compose
Requirements:
- Dockerfile with Go + OpenCV
- Mount volumes: config.yaml, training_digits/, review/, logs/
- Network access to RTSP stream and Home Assistant
- docker-compose.yml integration with existing HA setup
Considerations:
- Image size: ~500MB-1GB with OpenCV
- Multi-stage build possible for smaller runtime image
- Review file access via scp or lightweight HTTP server
SQLite + Live HTML
Goal: Real-time progress viewing without stopping app
Approach:
- Store frame data in SQLite as processed
- HTTP endpoint generates HTML on-demand from DB
- Access at
http://server:8080/review.htmlanytime
Benefits:
- No shutdown required to review progress
- Historical data queryable
- API potential for other tools
Adaptive Thresholds
Idea: Auto-adjust confidence and delta thresholds based on recent history
Example:
- If 90% of last 100 frames >95% confidence → raise threshold to 90%
- If frequent false positives → increase delta threshold temporarily
Troubleshooting Guide
"No templates found for digit X"
Problem: Missing training data for a digit.
Solution: Run system, review output, approve good images for that digit.
Frequent low confidence warnings
Causes:
- Camera moved/vibrated
- Lighting changed
- Focus drifted
- Display brightness changed
Solutions:
- Check camera mounting
- Adjust lighting (block reflections)
- Re-focus camera if needed
- Restart to re-detect layout
False corruption detection
Problem: Valid digits matched as invalid.
Solution: Review invalid templates, remove overly broad patterns.
Large delta causing unnecessary holds
Problem: Real physiological changes >3 points being held.
Solution:
- Increase delta threshold (e.g., to 5)
- Or adjust hindsight validation logic
- Consider different thresholds for SpO2 vs HR
Values not posting to Home Assistant
Checks:
- Home Assistant URL correct in config.yaml?
- Network connectivity?
- Sensors created in HA configuration?
- Check log file for POST errors
- Are values actually changing? (change-based posting only)
Performance Characteristics
Typical timing per frame (v3.48):
- Frame acquisition: 150-250ms (camera waiting time)
- Preprocessing: 8-12ms
- Frame scaling: 3-5ms
- SpO2 recognition: 9-12ms
- HR recognition: 9-12ms
- Validation: 0-1ms
- File I/O: 0ms (normal), 40-50ms (failures only)
- Home Assistant POST: 5ms (when values change)
- Total: 25-35ms per frame (processing only, excluding camera wait)
- Timestamp check (every 10th frame): ~22ms (optimized with reused OCR client)
Debug mode (with /debug flag):
- Additional file I/O: +40ms (saves all digit images)
- Total: 65-75ms per frame
With every 4th frame processing at 15fps:
- ~3.75 readings per second maximum
- Actual posting rate: varies (only when values change)
- CPU usage: Low (single-threaded, mostly idle)
- Timestamp validation: Every ~2.5 seconds (~22ms each)
Version History
v3.57 (Current - True Raw Frame Saving)
- CRITICAL FIX: Raw frames are now truly raw (before preprocessing)
- Problem in v3.56: "Raw" frames were saved AFTER preprocessing (cropped, rotated, thresholded)
- User confusion: Testing with "raw" frames didn't work as expected - frames were already processed
- Fix: Clone and threshold raw frame IMMEDIATELY after acquisition, before ANY preprocessing
- New flow:
- Acquire colored frame from camera (portrait, with timestamp)
- Clone frame → grayscale → threshold at 240 → save as rawThresholded
- Continue with original colored frame → preprocess (crop + rotate) → threshold → OCR
- On error: Save rawThresholded to raw_frames/
- Result: Saved "raw" frames are truly raw (portrait orientation, timestamp visible, not cropped/rotated)
- File size: ~100KB PNG (thresholded) vs 2MB colored JPG
- Benefits:
- Can test raw frames through full preprocessing pipeline
- See exactly what came from camera before any processing
- Small file size (thresholded) vs huge colored images
- Impact: When testing with raw_frames/raw_*.png, they'll go through crop+rotate+threshold like normal frames
v3.56 (Threshold AFTER Preprocessing)
- CRITICAL FIX: Moved threshold to AFTER preprocessing: Timestamp overlay was being thresholded!
- Problem in v3.54-v3.55: Threshold happened BEFORE preprocessing
- Colored frame → threshold → binary
- Timestamp extracted from binary frame (appeared as B&W)
- Preprocessed binary frame (cropped B&W timestamp)
- Correct flow now:
- Acquire colored frame
- Extract timestamp from colored frame (stays colored!)
- Preprocess colored frame (crop timestamp + rotate)
- Threshold preprocessed frame at 240 → binary
- Layout detection on binary
- OCR on binary
- Why this is correct:
- Timestamp area is removed BEFORE thresholding (stays colored)
- Only the actual display area (after crop/rotate) gets thresholded
- Single threshold at 240, happens once, at the right time
- Impact: Saved raw frames now show colored timestamp overlay (as expected)
- Problem in v3.54-v3.55: Threshold happened BEFORE preprocessing
v3.55 (Fixed Double Threshold Bug)
- CRITICAL FIX: Removed duplicate threshold in saveThresholdedFrame(): Frame was being thresholded TWICE
- Problem: saveThresholdedFrame() was doing grayscale conversion + threshold at 128
- But: rawFrame parameter is ALREADY thresholded binary (from line 313 in pulseox-monitor.go)
- Result: Frames were being thresholded twice - once at 240, then again at 128
- Fix: saveThresholdedFrame() now just saves the frame directly (no conversion, no threshold)
- Impact: Overlay/timestamp area should look consistent now
- This was the "extra" threshold causing B&W overlay to look different
- Confirmed ONLY one threshold operation now:
- pulseox-monitor.go line 292: Threshold at 240 (the ONLY threshold)
- layout_detection.go: No threshold (works on binary)
- ocr.go: No threshold in recognizeDisplayArea() (works on binary)
- processor.go: saveThresholdedFrame() just saves (no threshold)
- Note: ocr.go still has OLD deprecated recognizeDisplay() function with thresholds, but it's never called
v3.54 (Single Threshold Architecture - Had Double Threshold Bug)
- Attempted to threshold once at 240 after frame acquisition
- Removed internal thresholds from layout_detection.go and ocr.go
- BUT missed duplicate threshold in saveThresholdedFrame() → fixed in v3.55
v3.53 (CPU Optimization - Thread Limiting)
- Removed duplicate ProcessedCount increment bug
- Limited OpenCV threads to 1 for minimal overhead
v3.52 (CPU Optimization Complete)
- Cleaned up failed GPU acceleration attempts: Removed AVFoundation backend code
- Investigation findings: GPU hardware acceleration for RTSP streams is not feasible with current OpenCV setup
- AVFoundation doesn't support network streams (RTSP), only local camera devices
- OpenCV's ffmpeg backend has VideoToolbox compiled in but doesn't auto-activate for RTSP
- Would require either: (1) patching OpenCV's videoio plugin in C++, (2) complete rewrite with different library, or (3) local transcoding proxy
- Final optimization achieved: CPU reduced from 50% to ~30% (40% reduction)
- Single-threaded OpenCV (gocv.SetNumThreads(1)) eliminated thread overhead
- Homebrew's OpenCV has AVFoundation/VideoToolbox support but only for local devices
- ~15% CPU for RTSP H.264 decoding (software) + ~15% CPU for OCR processing
- Conclusion: 30% CPU is acceptable for 24/7 monitoring, fan noise significantly reduced
- Investigation findings: GPU hardware acceleration for RTSP streams is not feasible with current OpenCV setup
v3.51 (Attempted GPU Acceleration - Reverted)
- Attempted AVFoundation backend: Failed for RTSP streams (network streams not supported)
v3.50 (Reduced CPU Overhead)
- Added OpenCV thread limiting: Dramatically reduced CPU usage and thread count
- Problem: OpenCV was creating 40 threads (one per CPU core) for small operations
- Impact: Excessive context switching, 65%+ CPU usage for simple 280x200px image operations
- Fix: Limited OpenCV to 1 thread with
gocv.SetNumThreads(1)at startup - Rationale:
- Our images are tiny (280x200px per display)
- We process sequentially, not in parallel
- Processing time <20ms - faster than thread spawning overhead
- Single-threaded = zero thread management overhead
- Expected result:
- Thread count drops from ~40 to ~5-8 (only Go runtime threads)
- CPU usage drops from 65% to <20%
- No performance loss (processing still <20ms per frame)
- Startup message: "🔧 OpenCV threads limited to 1 (single-threaded for minimal overhead)"
- Testing: v3.50 with 2 threads showed 23 threads/48% CPU, so moved to 1 thread for further optimization
v3.49 (Fixed Frame Counter Bug)
- CRITICAL FIX: Removed duplicate ProcessedCount increment: Fixed skipped frame numbers
- Problem:
state.ProcessedCount++was being called twice:- In main loop (pulseox-monitor.go) - for every frame acquired
- In processor.go - when values changed and HASS posted
- Result: Every time values changed, the counter was incremented twice
- Symptom: Frame numbers would skip (e.g., #61, #62, skip #63, #64...)
- Pattern: Always skipped the frame number 2 positions after a HASS post
- Fix: Removed the duplicate increment in processor.go
- Now: Frame numbers increment consistently: #61, #62, #63, #64...
- Problem:
v3.48 (Major Performance Improvements)
- CRITICAL: Moved digit image saves to DEBUG_MODE only: Massive performance improvement
- Problem: OCR was saving 8 PNG files per frame (4 per display × 2 displays)
- checks.png visualization
- digit1.png, digit2.png, digit3.png (individual digits)
- full.png (labeled composite)
- Impact: ~5ms per PNG write × 8 files = 40ms of unnecessary file I/O per frame
- Fix: Wrapped all digit image saves in
if DEBUG_MODEchecks - Result: OCR now takes ~9-12ms total (just template matching, no file I/O)
- When needed: Run with
/debugflag to generate all digit images
- Problem: OCR was saving 8 PNG files per frame (4 per display × 2 displays)
- Only save boxes.jpg on failures: Reduced file I/O for successful frames
- Previously: Saved layout visualization (boxes.jpg) every time values changed (~51ms)
- Now: Only saves boxes.jpg when OCR fails (corruption, unrecognized, low confidence)
- Successful frames: No extra file I/O beyond what's needed
- Performance comparison:
- Before: Total ~80ms (Prep 10ms + Scale 4ms + OCR 20ms + FileIO 40ms + Valid 1ms + HASS 5ms)
- After: Total ~30ms (Prep 10ms + Scale 4ms + OCR 10ms + FileIO 0ms + Valid 1ms + HASS 5ms)
- ~60% faster processing for normal operation
- Debug workflow unchanged:
/debugflag still generates all images as before
v3.47 (Timing Instrumentation)
- Added comprehensive timing measurements with /timing flag: Enables detailed performance analysis
- New /timing command line flag: Activates timing mode (./pulseox-monitor /timing)
- Horizontal timing table: Shows timing breakdown for each frame
- Frame | Acquire | Prep | Scale | OCR_SpO2 | OCR_HR | Valid | FileIO | HASS | Total
- Header printed every 20 frames: Prevents scroll-back issues
- Timing measurements:
- Acquire: Frame acquisition from RTSP source - WAITING TIME (not processing)
- Prep: Preprocessing (crop timestamp + rotate 90°)
- Scale: Frame scaling to normalized width (if needed)
- OCR_SpO2: SpO2 display recognition (template matching)
- OCR_HR: HR display recognition (template matching)
- Valid: Validation checks (corruption, confidence, stability) - accumulated across all checks
- FileIO: Saving review images and debug files - accumulated across all file operations
- HASS: Home Assistant POST request (only when values change)
- Total: Wall-clock processing time (Prep + Scale + OCR + Valid + FileIO + logging overhead)
- Note: Acquire is separate (camera waiting time). Total may not equal sum due to logging/overhead.
- Purpose: Identify performance bottlenecks (e.g., excessive file I/O)
- Implementation:
- Added TimingData struct in types.go
- Added TIMING_MODE global flag
- Added printTimingTable() function in helpers.go
- Added timing measurements throughout processFrame() and processFrames()
- Timing data passed through entire processing pipeline
- Example output:
Frame | Acquire | Prep | Scale | OCR_SpO2 | OCR_HR | Valid | FileIO | HASS | Total ------|---------|------|-------|----------|--------|-------|--------|------|------- #123 | 2ms | 6ms | 0ms | 4ms | 4ms | 2ms | 38ms | 5ms | 61ms #124 | 2ms | 6ms | 0ms | 4ms | 4ms | 1ms | 39ms | 5ms | 61ms - Benefits:
- Quickly identify slow operations
- Track performance trends over time
- Validate optimization efforts
- Debug unexpected delays
v3.46 (Fixed "1" Detection + Raw Frame Improvements)
- CRITICAL FIX: Lowered "1" digit detection threshold from 90% to 85%
- Problem: "1" digits with 87-89% confidence were being rejected, causing misrecognition
- Example: "71" was being cut incorrectly because the "1" wasn't detected (87.1% < 90% threshold)
- Result: Wrong width used (100px instead of 72px), throwing off all cut positions
- Fix: Threshold lowered to 85% in
hasOneAt()function (ocr.go:196) - Impact: More reliable detection of "1" digits, especially with slight variations in brightness/focus
- Auto-enable DEBUG_MODE for file processing
- When testing with a file (e.g.,
./pulseox-monitor raw_frames/thresh_*.png), DEBUG_MODE automatically enabled - Eliminates need to add
/debugflag manually for single frame testing - Implementation: pulseox-monitor.go:65-66 in
runSingleFrameMode()
- When testing with a file (e.g.,
- Added comprehensive debug output for digit detection
- Shows display width at start of detection
- For each digit (3→2→1):
- Check position being tested
- hasOneAt() extraction region and confidence score
- Whether detected as "1" or not
- Width being used (72px vs 100px)
- Running total width
- Final CUT position calculated
- Region extraction coordinates and widths for all three digits
- Makes debugging cut position issues trivial - can see exact logic flow
- Fixed raw frame saving to be truly raw
- Previous: Frames were saved AFTER preprocessing (rotated, cropped)
- Now: Frames saved BEFORE any processing (portrait, with timestamp, untouched)
- Processing applied: Only thresholded (grayscale → binary) to save space as PNG
- Benefit: Can test raw frames with full preprocessing pipeline
- Implementation: Clone frame immediately after reading, pass to processFrame()
- Fixed duplicate logging in test mode
- Problem: Messages with
Bothtarget were printed twice in test mode - Cause: globalLogger was set to os.Stdout (same as console output)
- Fix: Set globalLogger = nil in test mode (pulseox-monitor.go:71)
- Result: Clean, non-duplicated output when testing with files
- Problem: Messages with
v3.45 (Added Processing Time Measurements)
- Added comprehensive timing measurements: Now shows how long each frame takes to process
- Frame processing time: Displayed in console output:
SpO2=97%, HR=76 bpm [45ms] - Frame acquisition interval: Logged to file showing time since last frame acquired
- Purpose: Understand actual performance vs camera lag vs processing bottlenecks
- What's measured:
- Frame acquisition interval (target: 250ms for 4 fps)
- Total processing time per frame (OCR + validation + file I/O)
- Logged for both changed values (console) and unchanged values (file only)
- Output examples:
- Console:
SpO2=97%, HR=76 bpm [45ms](when value changes) - Log file:
Frame #123: SpO2=97%, HR=76 bpm (no change) - processed in 42ms - Log file:
[TIMING] Frame acquired: +251ms since last (target: 250ms)
- Console:
- Benefits:
- Identify if processing time is exceeding 250ms target
- See actual frame rate vs target 4 fps
- Distinguish between camera timestamp lag and processing delays
- Debug performance issues with concrete data
- This addresses confusion about camera timestamp lag vs actual processing performance
- Frame processing time: Displayed in console output:
v3.44 (Save Thresholded PNGs, Not Colored JPGs)
- Changed error frame format: Now saves thresholded black/white PNG instead of colored JPG
- Rationale: Thresholded frames show exactly what the OCR sees, making debugging much more effective
- Previous: Saved raw colored camera frames that required manual processing to debug
- Now: Saves thresholded (binary) PNG showing the exact image fed to template matching
- Processing: Grayscale conversion → threshold at 128 → save as PNG
- Filename format:
raw_frames/thresh_YYYYMMDD-NNNNN.png - Saved on:
- Corruption detection (invalid template match)
- Unrecognized digits (negative values)
- Low confidence (first attempt)
- Low confidence after retry
- Benefits:
- Can replay frames directly through OCR pipeline
- See exactly what template matching is working with
- Identify missing templates or threshold issues immediately
- No need to manually preprocess frames for debugging
- PNG format (lossless) vs JPG (lossy with compression artifacts)
- Storage: ~50-100KB per frame vs 200-400KB for colored JPG
v3.43 (Fixed Timing + Removed Raw Frames)
- Fixed frame acquisition timing: 250ms timer now starts from frame ACQUISITION, not processing completion
- Previous behavior: Waited 250ms after processing finished (variable frame rate)
- New behavior: Waits 250ms from when frame was acquired (consistent 4 fps)
- Implementation: Set
lastProcessedTime = nowBEFORE returning frame, not after processing - Example: If processing takes 50ms, next frame acquired at exactly 250ms from previous acquisition
- Result: Guaranteed exactly 4 fps regardless of processing time variations
- Removed raw frame saving: System no longer saves unprocessed camera frames
- Rationale: Thresholded digit images (step4) are already saved for all failures
- What's saved on errors:
- review/f{N}_spo2_digit1.png, digit2, digit3 (thresholded crops)
- review/f{N}_hr_digit1.png, digit2, digit3 (thresholded crops)
- review/f{N}_spo2_full.png, hr_full.png (labeled displays)
- review/f{N}_spo2_checks.png, hr_checks.png (visualization)
- Removed:
- raw_frames/ directory no longer created
- rawFrame parameter removed from processFrame()
- rawFrame cloning removed from processFrames() loop
- Benefits: Reduced memory usage, simpler code, less disk I/O
- Thresholded images are more useful for debugging than raw camera frames
v3.42 (Time-Based Frame Processing)
- Changed from skip-based to time-based frame processing: More reliable and precise timing
- Previous approach: Process every 4th frame (variable timing depending on stream fps)
- New approach: Process frames at exactly 4 fps (250ms minimum interval)
- Implementation:
- Track
lastProcessedTimetimestamp - Only process frame if 250ms has elapsed since last processed frame
- Guarantees consistent 4 fps processing rate regardless of stream fps
- Track
- Benefits:
- Predictable, consistent processing rate
- Works correctly even if stream fps varies
- Simpler to understand and configure
- Updated NewRTSPSource(): Removed
skipCountparameter, now uses hardcoded 250ms interval - Console message: "📊 Processing frames at 4 fps (250ms minimum interval)"
- Confirmed: All step debug images only saved in DEBUG_MODE
- step1_original.png (only with /debug flag)
- step2_decolorized.png (only with /debug flag)
- step3_grayscale.png (only with /debug flag)
- step4_thresholded.png (only with /debug flag)
v3.41 (Critical Performance Fix)
- CRITICAL: Fixed excessive debug image writes causing high resource usage
- Problem: Saving 8 debug images PER FRAME (4 per display x 2 displays)
- Impact: At 3.75 fps = ~30 images/second = massive disk I/O and CPU usage
- Images being saved on every frame:
- step1_original.png (colored extraction)
- step2_decolorized.png (after alarm mode decolorization)
- step3_grayscale.png (after grayscale conversion)
- step4_thresholded.png (after binary threshold)
- Fix: Only save these debug images when DEBUG_MODE is enabled
- Result: Normal operation now only saves essential files (digit crops, checks, full)
- To enable debug images: Run with
./pulseox-monitor /debugflag - Performance improvement: ~70% reduction in disk I/O, significant CPU savings
- This was a leftover from debugging alarm mode decolorization that should never have been in production
v3.40 (Actionable Layout Detection Errors)
- Enhanced layout detection error diagnostics: Now saves debug visualization when "Only 1 digit display found" error occurs
- Problem: Error message was not actionable - couldn't see what boxes were detected or why they failed
- Fix: Save timestamped debug image showing all center boxes with labels
- Debug visualization shows:
- All boxes crossing 50% line (center region)
- Green boxes = qualified as digit displays (height >= 110px)
- Cyan boxes = rejected (too short)
- Box labels showing dimensions: "#0: H=XXpx OK" or "#1: H=XXpx TOO SHORT"
- Red line showing 50% Y position
- Summary at top: "Found: X center boxes, Y digit displays (need 2)"
- Enhanced console error output:
- Shows count of center boxes found
- Lists each box with dimensions and qualification status
- "✓ QUALIFIED" for boxes that passed height requirement
- "✗ TOO SHORT (< 110px)" for boxes that were rejected
- Path to debug image: test_output/layout_error_YYYYMMDD_HHMMSS.jpg
- Makes debugging layout failures much faster - can immediately see what system detected
v3.39 (Fixed Frame Scaling Order)
- CRITICAL FIX: Frame scaling now happens in correct order
- Problem: Boxes were calculated on original frame, then frame was scaled, causing coordinate mismatch
- Root cause: Layout detection calculated boxes on unscaled frame, but processFrame() scaled the frame later
- Fix: Layout detection now scales frame IMMEDIATELY after calculating scale factor (Step 4)
- Process flow:
- detectScreenLayoutAreas() calculates scale from bounding box width
- Scales frame to 860px width using gocv.Resize()
- Scales ALL coordinate arrays (bounding box, allBoxes, significantBoxes)
- ALL subsequent processing (contour detection, box calculation) uses scaled frame AND scaled coordinates
- Returns boxes in scaled coordinates + scale factor
- processFrame() scales every frame using same scale factor
- Boxes and frames now match perfectly
- Code changes:
- layout_detection.go: Added frame scaling at Step 4 (after scale calculation)
- layout_detection.go: Scale all coordinate arrays (scaledBoundingBox, scaledAllBoxes, scaledSignificantBoxes)
- All debug visualizations now use scaled frame (step 5-15 images)
- Removed tryAcquireLayout() wrapper function (unnecessary abstraction)
- pulseox-monitor.go: Calls detectScreenLayoutAreas() directly, stores layout+scale in state
- processor.go: Keeps scaling logic in processFrame() (needed for every frame)
- Benefits:
- Boxes calculated on 860px frame match frames that are scaled to 860px
- No more coordinate mismatches between layout detection and OCR processing
- Simpler code flow - no useless wrapper functions
- Alarm box trimming (Steps 8-10) preserved and working correctly
- Verified: Bounding box width log will show ~860px after scaling (was variable before)
- BUG FIX: Review entries now use correct frame numbers
- Problem: Successful readings used
state.FrameCount(always 0) instead of actualframeNum - Result: All images referenced f0_* instead of actual frame numbers
- Fix: Changed 3 instances in processor.go to use
frameNumparameter - Affected: Invalid physiological values, successful readings, unstable readings
- Failed readings (corruption, unrecognized, low confidence) were already correct
- Problem: Successful readings used
- OPTIMIZATION: Only save images for frames added to review
- Problem: Every processed frame saved images, even if values didn't change
- Result: review/ folder filled with unused images from StatusNoChange frames
- Fix: Delete digit images when values haven't changed (StatusNoChange)
- Saves disk space and makes review folder only contain relevant frames
- Deleted files: f*_spo2_checks.png, f*_spo2_digit[1-3].png, f*_spo2_full.png, f*hr*.png
v3.38 (Dynamic Frame Normalization)
- Implemented dynamic frame scaling based on bounding box width: Handles unstable device position
- Problem: Pulse oximeter lays loose on bed, causing frame width to vary
- Solution: Measure bounding box width in Step 4 of layout detection
- Process:
- Calculate scale factor:
scale = 860 / boundingBoxWidth - Log bounding box width and scale factor to console
- Store scale factor in ProcessingState.LockedScale
- Apply scaling to every frame before OCR processing
- Calculate scale factor:
- Changes:
detectScreenLayoutAreas()now returns(*ScreenLayout, float64, error)- includes scale factortryAcquireLayout()captures and stores scale in stateprocessFrame()applies scaling ifstate.LockedScale != 1.0- Logs: "📊 Bounding box width: Xpx, Scale factor: Y (target: 860px)"
- Benefits:
- Consistent 860px normalized width regardless of camera position/zoom
- Layout coordinates remain stable across frames
- OCR accuracy maintained even when device moves
v3.37 (Fixed Layout Detection Height Threshold)
- Fixed digit display height threshold: Changed from > 120px to >= 110px
- Problem: Layout detection was failing because digit displays were exactly 110-120px tall
- Previous code: Required height > 120px (too restrictive)
- Fixed code: Requires height >= 110px (matches MIN_BOX_HEIGHT constant)
- Added debug logging: Shows dimensions of each center box and whether it's accepted/rejected
- Helps diagnose layout detection failures
- Consistent with original design intent (MIN_BOX_HEIGHT = 110)
v3.36 (Unified Logging)
- Consolidated all logging to use logMessage() function: Replaced all fmt.Printf(), fmt.Println(), and inconsistent logging patterns
- ocr.go: Converted template loading messages to use logMessage()
- Template warnings → Console + Warning level
- Template loading success → Console + Info level
- frame_source.go: Converted frame processing info message
- "Processing every Nth frame" → Console + Info level
- normalize.go: Converted all debug output messages
- Decolorization progress → LogFile + Debug level
- Debug file saves → LogFile + Debug/Info level
- All files now use consistent logging pattern:
- Startup/shutdown → Console + Info
- Main readings → Both + Info
- Warnings/errors → Both + Warning/Error
- Debug messages → LogFile + Debug (respects DEBUG_MODE flag)
- Progress indicators → Console + Info
- Benefits:
- Single source of truth for logging behavior
- Consistent timestamp formatting across all logs
- Proper level-based filtering
- Clean separation of console vs file logging
- Debug mode properly controls debug output
- ocr.go: Converted template loading messages to use logMessage()
v3.35 (Logging System Implementation)
- Implemented logMessage() function in helpers.go: New unified logging system
- Target: Console, LogFile, Both
- Level: Debug, Info, Warning, Error
- Single-letter prefixes (D, I, W, E) for clean output
- Automatic DEBUG_MODE filtering for debug messages
- Timestamp formatting: [HH:MM:SS.mmm]
- Eliminates need for checking if logger != nil throughout codebase
v3.34 (Correct Order + Debug Images)
- Fixed order of layout detection: Center region FIRST, digit displays SECOND
- Correct process order:
- Find ALL contours with RetrievalList
- Collect all significant boxes (>50px width or height)
- Calculate bounding box from ALL boxes (not just digit displays)
- Find 50% Y line from overall bounding box
- Filter to boxes crossing 50% line (center region)
- From center boxes, identify digit displays (height >= 110px)
- Find alarm icons (center boxes extending beyond digit displays)
- Trim digit displays based on alarm icons
- Split by center X
- Find rightmost X in each half (using box center)
- Create fixed-width boxes (280px)
- Debug images at every step saved to test_output/:
- layout_step1_grayscale.jpg
- layout_step2_threshold.jpg
- layout_step3_eroded.jpg
- layout_step4_all_boxes.jpg (all boxes >50px in green)
- layout_step5_center_boxes.jpg (boxes crossing 50% line in cyan, red line)
- layout_step6_digit_displays.jpg (digit displays in magenta)
- layout_step7_alarm_icons.jpg (alarm icons in yellow, if any found)
- layout_step8_trimmed_displays.jpg (trimmed displays in green)
- layout_step9_final_boxes.jpg (final SpO2/HR boxes with labels)
- Key fix: Bounding box calculated from ALL contours, not just digit displays
- This prevents missing digit displays that are outside early bounding calculations
- Correct process order:
v3.33 (Contour Debug Logic + Center Region - WRONG ORDER)
- Implemented contour_debug.go logic combined with center region approach: As instructed
- Exact implementation of contour_debug.go detection:
- Find ALL contours with RetrievalList
- First pass: Identify digit displays (height >= 110px)
- Second pass: Find alarm icons (boxes extending beyond digit displays on right)
- Trim digit displays: Find leftmost alarm icon edge that overlaps, set as right boundary
- Combined with center region logic: 5. Calculate bounding box from trimmed digit displays 6. Find 50% Y line 7. Filter to displays crossing 50% line (center region) 8. Split by center X 9. Find rightmost X in each half (using box center to determine which half) 10. Create fixed-width boxes (CUT_WIDTH = 280px)
- Key differences from v3.32:
- Works with TRIMMED digit displays (after alarm icon trimming)
- Alarm icons properly detected and used to trim display boundaries
- Follows exact contour_debug.go algorithm before applying center logic
- Diagnostic output shows trimming actions when alarm icons detected
- Exact implementation of contour_debug.go detection:
v3.32 (Fixed Layout Detection - WRONG APPROACH)
- Reverted to proven layout detection logic: Fixed incorrect box detection from v3.30
- Problem in v3.30/v3.31: Alarm icon detection was finding wrong boxes (yellow alarm square instead of digit displays)
- Fix: Restored working center region approach without alarm icon complications
- Process:
- Find all significant contours (>50px width or height)
- Calculate overall bounding box
- Filter to boxes crossing 50% Y line with height >= 110px (digit displays)
- Create center region from filtered boxes
- Split by center X
- Find rightmost X in each half using box center to determine which half
- Create fixed-width boxes (CUT_WIDTH = 280px)
- Key insight: Use box center (not box edge) to determine left vs right half
- Alarm icon trimming removed for now - adds complexity without clear benefit
- Focuses on reliable detection of actual digit display boxes
- Debug images from v3.31 retained for troubleshooting
v3.31 (Decolorization After Extraction)
- Moved decolorization to after display area extraction: Layout detection now works on original colored frames
- Problem: Layout detection was trying to threshold colored alarm backgrounds, failing to find displays
- Fix: Layout detection now works on original colored frame (grayscale + threshold)
- Decolorization moved: Now happens in
recognizeDisplayArea()AFTER extracting the small display regions - Process flow:
- Preprocess (crop + rotate) - original colored frame
- Detect layout on colored frame (grayscale → threshold → find contours)
- Extract display areas (280x200px) - still colored
- Decolorize extracted regions (only ~56K pixels per display)
- Continue with OCR (grayscale → threshold → template matching)
- Debug images added: Each processing step now saves debug images:
f{N}_{display}_step1_original.png- Original extracted colored regionf{N}_{display}_step2_decolorized.png- After decolorizationf{N}_{display}_step3_grayscale.png- After grayscale conversionf{N}_{display}_step4_thresholded.png- After thresholding (ready for OCR)
- Allows visual verification of each processing step
- Decolorization only affects OCR, not layout detection
v3.30 (Simplified Layout Detection)
- Simplified layout detection with alarm icon trimming: Combined best of both approaches
- Uses RetrievalList to find ALL contours (including nested ones like alarm icons)
- Single contour pass: No more complicated center region extraction and re-detection
- Alarm icon detection: Finds boxes that extend beyond digit displays (clock icons, alarm indicators)
- Smart trimming: Trims BOTH SpO2 and HR displays if alarm icons found, otherwise uses regular boxes
- Process:
- Find all contours with RetrievalList
- Identify digit displays (height >= 110px)
- Find alarm icons (boxes extending beyond digit displays on the right)
- Trim digit displays based on leftmost alarm icon edge (if any)
- Split displays into left (SpO2) and right (HR) halves using center X
- Find rightmost X in each half for display boundaries
- Create fixed-width boxes (CUT_WIDTH = 280px)
- Much simpler and more reliable than previous double-contour approach
- Combines center region logic with direct contour approach from contour_debug.go
v3.29 (Correct Decolorization Scope)
- Fixed decolorization scope: Removed wasteful full-frame decolorization
- Problem in v3.28: Decolorizing entire 1390x2736 frame (3.8M pixels) took ~2000ms
- Reality: Decolorization only needed for small display regions during OCR
- Correct flow:
- Preprocess (crop + rotate) - original colored frame
- Detect layout on original frame (works fine with colored backgrounds)
- Extract display areas (280x200px each)
- Decolorize ONLY display regions in
recognizeDisplayArea()(~100K pixels) - Run OCR on decolorized regions
- Decolorization already exists in
ocr.goat the right place (line ~281) - No need for full-frame decolorization - layout detection works on colored frames
- Performance: ~40x faster (decolorizing 100K pixels vs 3.8M pixels)
v3.28 (Decolorization Timer - REVERTED)
- Added decolorization timing: Timer measures how long decolorizeFullFrame() takes
- Timer added after preprocessing in main processing loop
- Displays: "[TIMER] Decolorization took: Xms"
- Restored decolorization to pipeline: Was defined but not being called!
- Decolorization now runs between preprocessing and layout detection
- Enables alarm mode support (colored backgrounds → black/white)
- Allows measurement and optimization of decolorization performance
- Frame lifecycle: raw → preprocess → decolorize → detect/process
v3.27 (Save ACTUAL Raw Frames)
- CRITICAL FIX: Raw frames are now truly raw (unprocessed)
- Problem:
raw_frames/raw_*.pngfiles were being saved AFTER preprocessing (rotation + crop) - Result: When testing with "raw" frames, they were rotated AGAIN, making displays vertical instead of horizontal
- Fix: Clone raw frame immediately after reading from source (line 233)
- Flow now:
- Read frame from source (portrait orientation)
- Clone raw frame → keep for potential error saving
- Preprocess: crop timestamp + rotate 90° CW
- Decolorize
- Process
- IF error (corruption, low confidence, unrecognized) → save RAW clone
- Close raw frame clone
- Raw frames saved in
processor.goat error points using therawFrameparameter - Raw frames are now truly unprocessed - can be used for testing by running through full pipeline
- Fixes issue where testing with raw frames would double-rotate them
- Problem:
v3.26 (Fixed Redundant Decolorization)
- CRITICAL FIX: Eliminated 4x redundant decolorization calls: Now decolorizes ONCE at top level only
- Problem: Frame was being decolorized 4 times for every layout detection:
- In
detectScreenWidth()(2668x1458 - full frame) - In
saveNormalizationDebug()(2668x1458 - same frame again!) - In
detectScreenLayoutAreas()(2130x1164 - center region) - In
detectScreenLayoutAreas()center region (859x518 - smaller region)
- In
- Each call took 2-3 seconds - total 8-12 seconds wasted per frame!
- Fix: Removed ALL decolorization calls from normalize.go and layout_detection.go
- Now: Single decolorization in
pulseox-monitor.goat line 259-267 - Result: Frame processing time reduced from ~12 seconds to ~3 seconds
- Decolorized frame passed to all layout functions - they expect pre-decolorized input
- Simplified
decolorizeFullFrame()output - removed progress indicators (10%, 20%, etc.) - Added comments documenting that functions expect already-decolorized frames
- Problem: Frame was being decolorized 4 times for every layout detection:
v3.25 (Aggressive Debug Output)
- Added extensive debug image saves: Every processing step now saves an image to test_output/
- step1_original.jpg - Frame after preprocessing (rotated)
- step2_decolorized.jpg - After decolorizeFullFrame() (should be black/white)
- step3_grayscale.jpg - After grayscale conversion
- step4_thresholded.jpg - After binary threshold
- Added decolorization progress tracking:
- Shows progress every 10% of rows processed
- Shows final statistics: white pixel count/percentage, black pixel count/percentage
- Helps diagnose if decolorization is working correctly
- Purpose: Debug why alarm mode decolorization appears to not be working
v3.24 (Fixed Layout Decolorization)
- Fixed missing decolorization in layout detection: Layout detection was still using colored frames
- Added
decolorizeFullFrame()call indetectScreenLayoutAreas()function - Previously: Only
detectScreenWidth()was decolorizing, butdetectScreenLayoutAreas()was not - Problem: Layout detection was converting colored frame directly to grayscale
- Result: Alarm mode colored backgrounds prevented contour detection
- Fix: Decolorize BEFORE grayscale conversion in both contour detection passes:
- Initial contour detection (full frame)
- Center region contour detection
- Now: Layout detection works correctly in both normal and alarm modes
- This was the missing piece preventing alarm mode from working end-to-end
- Added
v3.23 (Higher White Threshold)
- Increased white pixel detection threshold: Changed from 200 to 240 for more selective white detection
- Updated in both
decolorizeFullFrame()(normalize.go) anddecolorizeAlarmMode()(ocr.go) - Previous threshold (200): Too permissive, caught near-white/gray pixels
- New threshold (240): More selective, only truly white pixels (near 255) are preserved
- Logic: If RGB all > 240 → keep as white (255), else → convert to black (0)
- Rationale: Display digits are pure white (255), not gray, so higher threshold is more accurate
- Better separation between white digits/borders and colored alarm backgrounds
- Updated in both
v3.22 (Decolorize Before Layout)
- Moved decolorization before layout detection: Alarm mode now works end-to-end
- New
decolorizeFullFrame()function in normalize.go - Applied BEFORE layout detection (in
detectScreenWidth()) - Previously: Decolorization only happened during OCR (too late)
- Now: Decolorization happens first, then layout detection sees clean white-on-black displays
- Fixes issue where colored alarm backgrounds prevented layout detection
- Layout detection flow:
- Decolorize full frame (yellow/cyan → black, white → white)
- Convert to grayscale
- Threshold at 170
- Find contours
- Same decolorization also applied in
saveNormalizationDebug()for accurate debug visualization - Result: System can now detect and recognize displays in both normal and alarm modes
- New
v3.21 (Preprocessing Debug)
- Added debug output for preprocessing: Prints frame dimensions before and after preprocessing
- Shows: "Before preprocess: WxH" and "After preprocess: WxH"
- Helps verify that rotation is actually happening
- Expected: Portrait (e.g., 1080x1920) → Landscape (e.g., 1920x1080)
- Temporary diagnostic output to verify preprocessFrame() is working
v3.20 (Unified Preprocessing)
- Verified unified preprocessing path: Both RTSP streaming and file test modes use identical preprocessing
- All frames go through
preprocessFrame()function:- Crop timestamp (top 68 pixels)
- Rotate 90° clockwise (portrait → landscape)
- Applied in unified
processFrames()loop (line 250) - Works for both
RTSPSourceandFileSource - Important: Raw frames saved to
raw_frames/are saved AFTER preprocessing - Therefore: Raw frames are already rotated and ready for testing
- If testing with unrotated frames, they'll go through preprocessing automatically
- Ensures consistent behavior regardless of frame source
- Layout detection expects landscape orientation (SpO2 and HR side-by-side horizontally)
- All frames go through
v3.19 (Alarm Mode Support)
- Added alarm mode decolorization: System now recognizes digits even when pulse oximeter is in alarm state
- New
decolorizeAlarmMode()function converts colored alarm backgrounds to black - Alarm mode displays:
- SpO2: Yellow/orange background with white digits
- HR: Cyan/blue background with white digits
- Decolorization logic:
- Pixels where ALL RGB values > 200 → kept as white (digits)
- All other pixels → converted to black (colored backgrounds, borders)
- Applied before grayscale conversion in OCR pipeline
- Enables normal thresholding and template matching to work correctly
- Previously: Colored backgrounds would confuse thresholding, causing OCR failures
- Now: Alarm mode digits recognized just as reliably as normal mode
- Example: "90" in yellow box → converted to "90" in black/white → OCR succeeds
- New
v3.18 (Zero Validation)
- Added '0' digit validation: Prevents false '0' recognition by checking for characteristic center hole
- New
validateZero()function checks for a hole at 50% Y (center) - Scans horizontally from 30%-70% X looking for black pixels (hole)
- Wider horizontal range than '8' validation (30-70% vs 40-50%) since '0' hole is larger
- If validation fails, marks digit as invalid (-1) triggering retry/re-detection
- Applied to all three digit positions (digit1, digit2, digit3)
- Complements existing '8' validation (two holes at 30% and 70% Y)
- Helps prevent misrecognition of corrupted/partial digits or digits with missing centers as '0'
- Example log output:
validateZero: center hole @ y=50 (50%) x=15-35 (true)
- New
v3.17 (Enhanced Cleanup)
- Enhanced directory cleanup on startup: Now cleans all output directories in streaming mode
- Previously: Only
review/was cleaned on startup - Now: Cleans
review/,raw_frames/, andtest_output/in streaming mode - Single-frame test mode: No cleanup (preserves test output)
- Progress display: Shows each directory being cleaned with checkmarks
- Example output:
🗑️ Cleaning output directories... - review/... ✓ - raw_frames/... ✓ - test_output/... ✓ - Rationale:
raw_frames/accumulates failed recognition frames over time; clean slate each run - Rationale:
test_output/contains debug visualizations that should be fresh per session
- Previously: Only
v3.16 (Append-Only HTML Review)
- Implemented append-only HTML generation: Review HTML is now written incrementally instead of all at once on shutdown
- HTML header written once at startup via
initReviewHTML() - Each frame entry appended immediately via
appendReviewEntry()as it's processed - Footer written on shutdown via
closeReviewHTML() - Browser handles missing closing tags gracefully, allowing live refresh
- Benefits:
- ✅ Live updates: refresh browser anytime to see latest frames
- ✅ No regeneration overhead: just one file write per frame
- ✅ Can run for hours without memory/performance issues
- ✅ No data loss if process crashes (all processed frames already written)
- All review entry points now consistently call
appendReviewEntry():- Corruption detection
- Unrecognized digits
- Low confidence (both first attempt and retry)
- Invalid physiological values
- Successful readings
- Unstable readings (held for validation)
- Fixed duplicate
appendReviewEntry()calls in corruption handler (was calling 3 times)
- HTML header written once at startup via
v3.15 (Fixed Eight Validation)
- Fixed validateEight() probe positions: Corrected hole detection based on actual measurements
- Issue: Was probing at 33% and 67% which hit the white digit and middle bar
- Fix: Now checks horizontal line segments at 30% and 70% height
- Scans X from 40% to 50% of width (10% range centered around middle)
- Looks for ANY black pixel (< 128) in that range = hole detected
- Much more robust than single pixel check - tolerates slight variations
- Holes are BLACK (empty space), digit is WHITE in thresholded image
- Prevents false rejection of valid "8" digits
v3.14 (Corruption Debug Output)
- Added debug output for corruption cycles: Helps diagnose repeated corruption detection
- When corruption is detected, saves debug files to
test_output/with timestamp - Saves normalized frame:
corruption_YYYYMMDD_HHMMSS_frameN_normalized.png - Saves layout visualization:
corruption_YYYYMMDD_HHMMSS_frameN_layout.jpg - Also includes all digit crops from review/ directory (already generated by OCR)
- Allows offline analysis when system gets stuck in corruption → re-detect → corruption cycles
- Log message: "💾 Debug saved: YYYYMMDD_HHMMSS"
- When corruption is detected, saves debug files to
v3.13 (RTSP Reconnect Fix + Eight Validation)
- Fixed segmentation fault on RTSP reconnect failure: Prevents crash when stream reconnection fails
- Added null check before reading from stream
- Sets stream to nil and marks source as closed after failed reconnect
- Returns shouldContinue=false to stop processing instead of attempting to read from null stream
- Prevents SIGSEGV when VideoCapture is null
- Added '8' digit validation: Prevents false '8' recognition by checking for two characteristic holes
- New
validateEight()function probes two specific regions (at 1/3 and 2/3 height) - Checks if both regions are at least 60% black (indicating holes)
- If validation fails, marks digit as invalid (-1) triggering retry/re-detection
- Applied to all three digit positions (digit1, digit2, digit3)
- Helps prevent misrecognition of corrupted/partial digits as '8'
- New
v3.12 (Physiological Value Validation)
- Added minimum value threshold for Home Assistant posting: Prevents posting physiologically impossible values
- Validation rule: Both SpO2 and HR must be >= 40 to post to Home Assistant
- Values below 40 are logged to file and added to review, but NOT posted to HASS
- Console displays: "⚠️ Values below 40 - not posting to HASS"
- Rationale: SpO2 or HR below 40 are not physiologically possible in living patients
- Prevents false alarms and invalid data in Home Assistant database
- Common scenarios triggering this:
- Pulse oximeter disconnected (shows dashes, may be misread as low numbers)
- Recognition errors resulting in single-digit values (0-9)
- Display corruption being partially recognized
v3.11 (Reduced Raw Frame Storage)
- Raw frames now only saved on recognition failures: Significantly reduces disk space usage
- Previously: Every processed frame saved to
raw_frames/(all successes and failures) - Now: Only saves frames when recognition fails (corruption, unrecognized, low confidence)
- Successful readings and unstable readings (held for validation) no longer saved
- Failure cases where frames are saved:
- ❌ Corrupted frames (invalid pattern matched)
- ❌ Unrecognized digits (negative values)
- ❌ Low confidence readings (both first attempt and retry)
- Typical result: ~90% reduction in raw frame storage for stable monitoring
- Failed frames still available for debugging and training data collection
- Previously: Every processed frame saved to
v3.10 (Corruption Frame Number)
- Added frame number to corruption log message: Easier debugging of false corruption detections
- Message now shows:
[CORRUPTION] Frame #123 - Digit = -1 detected: SpO2(9,5) HR(7,-1) - skipping frame - Helps identify which specific frame triggered invalid pattern match
- Useful for reviewing frames in HTML output and approving/removing invalid templates
- Message now shows:
v3.9 (Simplified Digit Detection Arrays)
- Simplified digit detection to use only arrays: Removed all intermediate variables
D[1,2,3]array stores cut positions for digit extractiondigitIsOne[1,2,3]array stores detection results (is it a "1"?)- Eliminated intermediate variables like digit1IsOne, digit2IsOne, digit3IsOne, middleStart, rightStart
- Code now directly uses array notation throughout:
digitIsOne[2],D[3], etc. - Cleaner, more maintainable code with single source of truth
- Fixed "13" recognition bug: D1 position now correctly calculated from accumulated widths
- Loop accumulates widths: digit3 → digit2 → digit1
- D1 check position = w - digit3Width - digit2Width - 5
- Previously was checking at wrong position when only 2 digits present
v3.8 (Code Cleanup & Digit Detection Loop)
- Added helpers.go: New file with
logf()helper function- Eliminates repetitive
if logger != nilchecks throughout codebase - Makes logging calls cleaner and more maintainable
- Applied across processor.go, validators.go, and ocr.go
- Eliminates repetitive
- Refactored digit detection to use loop: Replaced repetitive digit checking code with elegant loop
- Uses arrays for digit names, detection results, and widths
- Accumulates width progressively (digit3 → digit2 → digit1)
- More maintainable and easier to understand
- Eliminates code duplication
v3.7 (3-Digit Value Calculation Fix)
- Fixed 3-digit value calculation bug: Removed score requirement for digit1 detection
- Issue: When
digit1IsOnedetected a "1" digit, code also required template match score > 50% - Problem: digit1 region includes empty padding, causing low template match scores
- Result: 106 was calculated as 6, 103 as 3, 104 as 4
- Fix: Trust
hasOneAt()detection alone - if digit1IsOne is true, use 100 + num2*10 + num3 - The
hasOneAt()function already confirms there's a "1" pattern at the correct position
- Issue: When
v3.6 (Millisecond Timestamps)
- Added millisecond precision to timestamps: Changed timestamp format from
15:04:05to15:04:05.000- Provides more precise timing for frame processing and reading logs
- Helpful for debugging timing issues and frame processing delays
- Format:
[03:03:17.245] SpO2=93%, HR=85 bpm
v3.5 (Frame Rate + Timestamp Validation)
- Frame rate adjustment: Changed from 30fps to 15fps camera
- Adjusted processing from every 6th frame to every 4th frame
- Processing rate: ~3.75fps (was ~5fps)
- Camera was struggling with 30fps, reduced to 15fps
- Timestamp validation: Added OCR check of camera timestamp
- Validates timestamp every 10 processed frames (~2.5 seconds)
- Warns if camera time differs by >3 seconds from server
- Uses optimized gosseract library (~22ms per check)
- Optimizations: Downscale 50%, grayscale, Otsu threshold, invert, PNG encode
- Reused OCR client for major speed boost (was 120ms, now 22ms warm)
- Silent when drift is within ±3 seconds
- Helps detect frozen/lagging streams
- New file:
timestamp_ocr.go- Timestamp extraction and validation
v3.4 (Low Confidence Counter)
- Added low confidence counter: Tracks and displays frequency of low confidence frames
- New
LowConfidenceCountfield in ProcessingState - Console displays count when it increments: "Low confidence (#X)"
- Helps identify camera/lighting issues needing attention
- Counter appears for both first detection and retry failures
- New
v3.3 (Corruption Detection Fix)
- Fixed corruption detection: Direct check prevents invalid frames from reaching stability
- Changed from
validateCorruption()wrapper to directIsCorrupted()call - ANY digit = -1 (invalid template match) now immediately returns StatusCorrupted
- Prevents invalid/corrupted frames from being marked as "unstable"
- Ensures clean separation: corruption → silent skip, low confidence → retry
- Changed from
v3.2 (Review & Signal Handling)
- Fixed Ctrl+C handling: Simplified signal handling to prevent segfaults
- Removed stream.Close() from signal handler (was causing SIGSEGV)
- Stream cleanup now happens naturally on program exit
- Sets closed flag only, checks happen between frame reads
- Enhanced review.html: Failed frames now show all digit images
- Corruption, low confidence, and unrecognized frames display digit crops
- Allows reviewing and approving digits from failed recognitions
- Red background distinguishes failed frames
- Silenced corruption detection: Invalid patterns work quietly
- Corrupted frames no longer appear in console or HTML
- Only logged to file for debugging purposes
- Keeps review focused on actionable items
v3.1 (Optimizations)
- Removed FPS display: Cleaned up console output
- No more "Stream FPS: X (avg over Y frames)" messages
- Simpler, focused console showing only readings and warnings
- Negative value handling: Unrecognized digits now trigger immediate retry
- SpO2 or HR < 0 (OCR returned -1) treated as low confidence
- Triggers immediate next frame read + layout re-detection
- Prevents false "unstable" warnings for blurry/unrecognized frames
- Frame rate adjustment: Changed from every 4th to every 6th frame
- ~5 fps processing rate from 30fps stream
- Better balance for 30fps camera (was tuned for 15fps)
v3.0 (Refactored)
- Modular architecture: Split monolithic pulse-monitor.go into focused modules
- processor.go: Frame processing orchestration
- validators.go: Validation logic
- frame_source.go: Abstracted sources (RTSP/file)
- types.go: Shared data structures
- normalize.go: Frame normalization
- Smart escalation strategy: Progressive failure handling
- 1st failure: Try next frame (0s wait)
- 2nd failure: Re-detect layout (0s wait)
- 3rd failure: Wait 10s
- 4th failure: Wait 30s
- 5th+ failures: Wait 60s
- Success resets counter
- Interruptible sleep: Ctrl+C works immediately during waits
- Checks
source.IsActive()every second - No more forced 60s waits before shutdown
- Checks
- ConsecutiveFailures counter: Tracks problems across processing
- Incremented on layout failures, corruption, low confidence
- Reset on successful processing
- Enables intelligent escalation
- See V3_CHANGES.md for complete migration guide
v2.35
- Robust layout detection under varying light conditions
- Changed Step 3 logic: instead of picking "2 largest boxes", find rightmost X in each half
- Splits detection area at X-center, finds max(rightX) for left half (SpO2) and right half (HR)
- Handles digit fragmentation: even if "95" breaks into "9" and "5" contours, still finds correct right edge
- Tracks Y range across all contours in each half for proper bounding
- Raw frame archiving
- Saves every processed frame to
raw_frames/raw_YYYYMMDD-NNNNN.png(persistent directory) - Enables offline testing and replay of detection algorithms
- Files are the rotated frames (after timestamp crop), exactly as fed to detection pipeline
- Not cleaned on restart - accumulates for historical analysis
- Saves every processed frame to
v2.34
- Every 4th frame processing (improved responsiveness)
- Simplified corruption handling (no immediate retry)
- Silent corruption detection (log only)
- Digit 3 detection offset: -8 pixels
- Invalid template: 2 patterns
Previous versions
- v2.33: Added hindsight validation for large deltas
- v2.32: Implemented low confidence retry with layout re-detection
- v2.31: Added invalid digit detection
- Earlier: Various contour detection and template matching improvements
Contact & Context
Development Environment: macOS (local development and testing)
Target Deployment: Ubuntu 22.04 with Docker
Home Assistant Version: (check your setup)
Camera: (specify your RTSP camera model)
Project Location: /Users/johanjongsma/pulse-monitor/
Quick Reference
Start monitoring
cd /Users/johanjongsma/pulse-monitor
./pulseox-monitor
Stop and generate review
Ctrl+C
# Opens review/review.html automatically
Check logs
tail -f pulseox-monitor_*.log
Approve training digit
# Move good digit image to training_digits/
# Naming: {digit}_{variant}.png
Mark as invalid/corruption
# Move bad pattern to training_digits/invalid/
# Naming: descriptive name like "white_blob_2.png"
Document Purpose: This document serves as a comprehensive reference for understanding the project state, architecture, and reasoning. It should enable a new chat session (or developer) to quickly understand what's been built, why decisions were made, and how to continue development.
Last Verified Working: November 17, 2025 (v3.57 - True raw frame saving)