pulse-monitor/backups/backup_20251127/PROJECT_STATE.md

75 KiB
Raw Blame History

PulseOx Monitor - Project State & Technical Documentation

Current Version: v3.57 (True Raw Frame Saving) Last Updated: November 17, 2025
Status: Production - threshold after preprocessing, timestamp stays colored

⚠️ IMPORTANT: Update VERSION constant in pulseox-monitor.go and this document with every build!

For AI Assistant: You MUST increment the version number in both pulseox-monitor.go and this document whenever you make ANY code changes. This is not optional - it's a critical part of the workflow.


🔧 For New Chat Sessions: Data Access

You have full filesystem access to /Users/johanjongsma/pulse-monitor/

Use these tools to work with the project:

  • Filesystem:read_file - Read source code, logs, config files
  • Filesystem:write_file - Create new files
  • Filesystem:edit_file - Modify existing code (preferred for edits)
  • Filesystem:list_directory - See what files exist
  • Filesystem:move_file - Move/rename files (e.g., approving training digits)
  • Filesystem:search_files - Find files by pattern

Key files you'll need:

  • Source: pulseox-monitor.go, ocr.go, layout_detection.go, etc.
  • Config: config.yaml
  • Training: training_digits/*.png, training_digits/invalid/*.png
  • Output: review/*.png, review/review.html
  • Logs: pulseox-monitor_*.log

DO NOT waste time asking how to access files - just use the filesystem tools directly!


Project Overview

This system monitors a Masimo Rad-G pulse oximeter by reading its display via RTSP camera feed. It uses computer vision (OpenCV/gocv) to perform OCR on the SpO2 and heart rate values, then posts readings to Home Assistant for tracking and alerting.

Why This Exists

The Masimo pulse oximeter doesn't have a data output port, so we use a camera pointed at its display. The display has significant overexposure and light leakage issues, making traditional OCR (Tesseract) unreliable. We developed a custom template-matching approach that handles these challenging conditions.


Architecture

Main Components

  1. pulseox-monitor.go - Main application (v3.10)

    • Unified frame processing loop
    • Interruptible sleep for graceful shutdown
    • Smart escalation strategy
    • Signal handling (Ctrl+C)
  2. processor.go - Frame processor

    • OCR execution
    • Result validation
    • Home Assistant posting
  3. validators.go - Validation handlers

    • Corruption detection
    • Confidence checking
    • Stability validation (hindsight)
  4. ocr.go - Recognition engine

    • Template matching for digits 0-9
    • Invalid pattern detection (corruption)
    • hasOneAt() function for detecting narrow '1' digits
    • Dynamic width calculation for 2-digit vs 3-digit displays
  5. layout_detection.go - Display locator

    • Finds SpO2 and HR display areas in frame
    • Position-based detection (rightmost edge method)
    • Handles digit fragmentation
  6. normalize.go - Frame normalization

    • Screen width detection
    • Scaling to 860px width
    • Layout detection with normalization
  7. frame_source.go - Frame sources

    • RTSP streaming source
    • Single-file test source
    • Interruptible via IsActive() method
  8. types.go - Shared data structures

    • ProcessingState with ConsecutiveFailures counter
    • Reading, ProcessingResult types
    • Status and Action enums
  9. homeassistant.go - Data posting

    • REST API integration
    • Posts only when values change
    • Separate sensors: sensor.pulse_ox_spo2, sensor.pulse_ox_hr
  10. html_report.go - Review interface

    • Dark-themed HTML generated on shutdown
    • Shows all processed frames with confidence scores
    • Visual indicators for unstable readings
  11. timestamp_ocr.go - Timestamp validation

    • Extracts camera timestamp from frame
    • Uses Tesseract OCR
    • Validates against server time
    • Warns if lag >10 seconds

Data Flow

RTSP Stream / File → Frame Capture (every 4th frame) → 
Timestamp validation (every 10th frame) on **COLORED** frame →
**Clone frame for error saving** (grayscale → threshold at 240 → save as rawThresholded) →
Preprocessing (crop 68px top, rotate 90° CW) on **COLORED** frame → 
**THRESHOLD ONCE at 240** (grayscale → binary) AFTER preprocessing →
Layout detection (find contours on binary) → 
Display area extraction (280x200px binary regions) →
Template matching OCR on binary → 
Digit validation (8 & 0) →
Exception handling (saves rawThresholded on errors) → 
Home Assistant posting

Recognition System

Template Matching Approach

We use a library of digit images extracted from the actual pulse oximeter display. Each digit (0-9) can have multiple template variants to handle slight variations.

Current templates:

  • Digits 0-9: Multiple variants per digit
  • Invalid patterns: 2 templates in training_digits/invalid/
    • white_blob_1.png
    • invalid_2.png

Template location: /Users/johanjongsma/pulse-monitor/training_digits/

Digit Width Detection

The system handles both 2-digit (XX) and 3-digit (1XX) displays dynamically:

  • Narrow '1' detection: hasOneAt(x) checks if column X is 80%+ white and column X+10 is 80%+ black
  • Check positions:
    • Digit 3 (rightmost): w-8 pixels from right edge
    • Digit 2: Depends on digit 3 width (72px for '1', 100px otherwise)
    • Digit 1: Depends on both digit 2 and 3 widths

Why -8 offset for digit 3? Originally used -5, but testing showed '1' digits needed detection slightly more to the left to catch the right edge reliably.

Display Positioning

  • SpO2 and HR displays: 234 pixels apart vertically
  • Layout detection: Run once at startup, locked unless re-detection triggered
  • Frame processing: Every 4th frame (~3.75 fps from 15fps stream)
  • Timestamp check: Every 10th processed frame (~2.5 seconds)

Exception Handling System

The system has three independent exception handlers that run in sequence:

1. Confirmed Corruption (Priority: Highest)

Trigger: Any digit recognized as -1 (matched invalid template with >70% confidence)

Behavior:

  • Log to file (silent to console)
  • Skip corrupted frame
  • Continue to next frame through normal processing
  • No special retry, no layout re-detection

Why this approach: Corruption is a recognition failure, not a camera/lighting issue. The next frame will naturally be clean.

Invalid templates: Stored in training_digits/invalid/ - patterns like white blobs, UI elements, or corrupted digits that should never be interpreted as valid numbers.

2. Low OCR Confidence

Trigger: Average confidence < 85% for SpO2 or HR, OR negative values (unrecognized digits)

Behavior:

  1. First occurrence:

    • Log "⚠️ Low confidence, retrying with next frame..." or "[UNRECOGNIZED] SpO2=X, HR=Y"
    • Read next frame immediately
    • Re-detect layout (handles camera movement/vibration)
    • Process retry frame
    • If successful (high confidence + values changed): perform stability check and post
  2. Second consecutive low confidence:

    • Log "⚠️ Low confidence after retry, pausing 2 seconds"
    • Give up, wait 2 seconds before continuing

Why re-detect layout: Low confidence or unrecognized digits often indicate camera movement, lighting changes, focus issues, or temporary blur. Re-detecting layout helps recover from these transient problems.

3. Large Delta (Hindsight Validation)

Trigger: Values changed >3 points from last stable reading (e.g., SpO2: 95→92 or HR: 70→75)

Behavior:

  1. Hold new reading as "pending" (don't post yet)
  2. Store baseline (value before the spike)
  3. Wait for next reading
  4. If next reading confirms trend (moves in same direction):
    • Post the held reading
    • Post the confirming reading
  5. If next reading contradicts (moves opposite direction or returns to baseline):
    • Discard held reading as glitch
    • Post the new reading

Why hindsight validation: Physiological changes are gradual. Sudden spikes >3 points are usually recognition glitches or transient display artifacts. This prevents false alarms while allowing real trends.

Example scenario:

Baseline: SpO2=95
Reading 1: SpO2=91 (Δ4) → HOLD, don't post
Reading 2: SpO2=89 (Δ2 from held) → Trend confirmed, post 91 then 89

vs.

Baseline: SpO2=95
Reading 1: SpO2=91 (Δ4) → HOLD, don't post
Reading 2: SpO2=95 (back to baseline) → Discard 91 as glitch, post 95

Key Design Decisions

1. Change-Based Posting Only

Decision: Only post when SpO2 or HR values actually change.

Rationale:

  • Reduces Home Assistant database bloat
  • Avoids unnecessary network traffic
  • Still captures all meaningful changes

2. Template Matching Over Tesseract OCR

Decision: Use custom template matching instead of Tesseract.

Rationale:

  • Pulse oximeter display has severe overexposure
  • Tesseract unreliable with bright backgrounds
  • Template matching: 90%+ accuracy vs. Tesseract: ~60%
  • We have stable digit shapes - perfect for template matching

3. Fixed Layout After Startup

Decision: Detect layout once, lock it, only re-detect on low confidence retry.

Rationale:

  • Camera is stationary (mounted on tripod)
  • Display position doesn't change
  • Layout detection is expensive (~10-20ms)
  • Re-detection only needed if camera moves/vibrates

4. No Immediate Corruption Retry

Decision: Don't immediately retry after detecting corruption; just skip to next frame.

Rationale:

  • Corruption is a recognition issue, not camera issue
  • Next frame (133ms later at 15fps) will naturally be different
  • Avoids complexity of retry-within-retry scenarios
  • Simpler code, same reliability

5. Processing Every 4th Frame

Decision: Process every 4th frame (~3.75 fps from 15fps stream).

Rationale:

  • Balance between responsiveness and CPU usage
  • SpO2/HR don't change faster than ~1 reading/second physiologically
  • Allows time for processing, posting, and stability checks
  • Adjusted for 15fps camera (was 6th frame for 30fps)
  • Can be adjusted: 3rd frame = ~5fps, 5th frame = ~3fps

6. Dark Theme HTML Review

Decision: Dark background with light text in review HTML.

Rationale:

  • Easier on eyes during long review sessions
  • Better contrast for confidence scores
  • Color-coded indicators more visible

7. No FPS Display

Decision: Don't display stream FPS statistics.

Rationale:

  • Stream FPS is constant (30fps from camera)
  • FPS logging adds noise to console output
  • Processing rate (every 6th frame) is more relevant than raw stream FPS
  • Keeps console focused on actual readings and warnings

8. Digit Validation for '8' and '0'

Decision: Validate recognized '8' and '0' digits by checking for characteristic holes.

Rationale:

  • Template matching can falsely recognize corrupted/partial digits as '8' or '0'
  • '8' validation: checks for TWO holes (at 30% and 70% Y, scanning 40-50% X)
  • '0' validation: checks for ONE hole (at 50% Y, scanning 30-70% X)
  • Wider scan range for '0' (30-70%) than '8' (40-50%) because '0' hole is larger
  • If validation fails, digit marked as -1 (invalid) → triggers retry/re-detection
  • Prevents false alarms from misrecognized digits
  • Simple pixel-based validation is fast and reliable

9. Alarm Mode Decolorization

Decision: Pre-process colored alarm backgrounds before OCR to normalize them to black/white.

Rationale:

  • Pulse oximeter displays colored backgrounds when in alarm mode (low SpO2 or abnormal HR)
  • SpO2 alarm: Yellow/orange background, HR alarm: Cyan/blue background
  • Normal thresholding fails with colored backgrounds
  • Decolorization converts: white digits (RGB all > 240) → white (255), everything else → black (0)
  • Applied ONLY to display regions AFTER extraction (in recognizeDisplayArea() function)
  • Layout detection works on original colored frames - grayscale conversion handles colors fine for contour detection
  • Decolorization timing:
    1. Layout detection: Original colored frame → grayscale → threshold → find contours (works fine)
    2. Extract display regions: Still colored (280x200px = 56K pixels)
    3. Decolorize extracted regions: Only ~100K pixels total (both displays)
    4. Continue with OCR: grayscale → threshold → template matching
  • Each display region is ~280x200px = 56K pixels
  • Decolorizing two display regions (~100K pixels) takes <10ms
  • Threshold of 240 ensures only pure white pixels (digits/borders) are preserved, not grays
  • No performance impact - simple pixel-wise operation on small regions
  • Enables reliable digit recognition regardless of alarm state
  • Critical for 24/7 monitoring where alarms are expected

Environmental Considerations

Light Leakage Issues

Problem: Ambient light creates false contours and affects brightness detection.

Solutions implemented:

  • Increased brightness threshold to 170 (from lower values)
  • More selective contour filtering
  • Height filters to exclude oversized boxes (<200px)

Manual mitigation: Position towel/cloth to block light from reflecting off display surface.

Camera Positioning

Critical: Camera must be positioned to minimize glare and ensure consistent framing.

Current setup:

  • Tripod-mounted camera
  • RTSP stream from network camera
  • Fixed zoom/focus (no autofocus)

Files and Directories

Source Files

  • pulseox-monitor.go - Main application
  • processor.go - Frame processor
  • validators.go - Validation handlers
  • ocr.go - Recognition engine
  • layout_detection.go - Display finder
  • normalize.go - Frame normalization
  • frame_source.go - Frame sources (RTSP/file)
  • types.go - Shared data structures
  • helpers.go - Utility functions (logf)
  • homeassistant.go - API integration
  • html_report.go - Review page generator
  • timestamp_ocr.go - Timestamp validation
  • config.go - Configuration loading

Configuration

  • config.yaml - Camera URL and Home Assistant settings

Training Data

  • training_digits/*.png - Valid digit templates (0-9)
  • training_digits/invalid/*.png - Corruption patterns

Output

  • review/ - Frame captures and digit images (cleaned each run)
    • f{N}_boxes.jpg - Visualization of detected areas
    • f{N}_{display}_checks.png - Digit detection analysis
    • f{N}_{display}_digit[1-3].png - Individual digit crops
    • f{N}_{display}_full.png - Full display with recognized value
    • review.html - Live-updated HTML (append-only)
  • raw_frames/ - Raw processed frames for failed recognitions (cleaned each run)
    • raw_YYYYMMDD-NNNNN.png - TRUE raw frames (unprocessed, portrait orientation, with timestamp)
    • Only saved when recognition fails (corruption, low confidence, unrecognized)
    • Used for debugging and training data collection
    • Can be tested directly: ./pulseox-monitor raw_frames/raw_*.png (will go through full preprocessing pipeline)
  • test_output/ - Debug visualizations (cleaned each run)
    • Layout detection debug files
    • Corruption debug snapshots with timestamps

Logs

  • pulse-monitor_YYYYMMDD_HHMMSS.log - Detailed execution log
    • Console shows only: value changes, warnings, errors
    • File logs: all frames, timing, confidence scores, decisions

Development Workflow

Version Management

CRITICAL: Update version with every build!

For AI Assistant (Claude):

  • ALWAYS increment version number when making ANY code changes
  • Update BOTH pulse-monitor-new.go and PROJECT_STATE.md
  • Add entry to Version History section documenting what changed
  • This is MANDATORY, not optional

Version increment process:

  • Open pulseox-monitor.go
  • Update const VERSION = "v3.XX" (increment version number)
  • Document what changed in version history section below
  1. After significant changes:

    • Update this PROJECT_STATE.md document
    • Keep architecture, design decisions, and troubleshooting current
    • Update "Last Verified Working" date at bottom
  2. Commit discipline:

    • Version bump = code change
    • Every feature/fix gets a version increment
    • Keep PROJECT_STATE.md in sync with code

Adding New Training Digits

When you find a good digit image in the review files:

# Example: Approve f768_hr_digit3.png as digit "1"
# Creates training_digits/1_2.png (next available variant)

Current naming: {digit}_{variant}.png (e.g., 7_1.png, 7_2.png)

Adding Invalid Patterns

When you find corrupted/glitchy images:

# Move to invalid folder with descriptive name
# Creates training_digits/invalid/invalid_X.png

Testing Changes

  1. Edit code on Mac
  2. Compile: ./build.sh
  3. Run: ./pulseox-monitor
  4. Monitor console for value changes and warnings
  5. Check log file for detailed diagnostics
  6. Review HTML on shutdown (Ctrl+C) for visual verification

Performance Tuning

Frame rate:

  • Change skipCount parameter in NewRTSPSource call (currently 4)
  • Located in pulse-monitor-new.go
  • Values: 3=~5fps, 4=~3.75fps, 5=~3fps (for 15fps stream)

Confidence threshold:

  • Currently 85% for high confidence
  • Located in multiple places in main loop
  • Lower = more false positives, Higher = more retries

Delta threshold:

  • Currently 3 points for stability check
  • Change in large delta handler
  • Lower = stricter (more held readings), Higher = more lenient

Known Limitations

  1. Camera-dependent: Requires good camera positioning and lighting
  2. Display-specific: Templates are specific to Masimo Rad-G display fonts
  3. No real-time HTML: Review HTML only generated on shutdown
  4. Three-digit limitation: Assumes HR displays as 1XX when >99, may need adjustment for other scenarios
  5. Brightness sensitivity: Very bright ambient light can still cause issues

Future Enhancements (Planned)

Docker Deployment

Target: Ubuntu 22.04 server with Docker Compose

Requirements:

  • Dockerfile with Go + OpenCV
  • Mount volumes: config.yaml, training_digits/, review/, logs/
  • Network access to RTSP stream and Home Assistant
  • docker-compose.yml integration with existing HA setup

Considerations:

  • Image size: ~500MB-1GB with OpenCV
  • Multi-stage build possible for smaller runtime image
  • Review file access via scp or lightweight HTTP server

SQLite + Live HTML

Goal: Real-time progress viewing without stopping app

Approach:

  • Store frame data in SQLite as processed
  • HTTP endpoint generates HTML on-demand from DB
  • Access at http://server:8080/review.html anytime

Benefits:

  • No shutdown required to review progress
  • Historical data queryable
  • API potential for other tools

Adaptive Thresholds

Idea: Auto-adjust confidence and delta thresholds based on recent history

Example:

  • If 90% of last 100 frames >95% confidence → raise threshold to 90%
  • If frequent false positives → increase delta threshold temporarily

Troubleshooting Guide

"No templates found for digit X"

Problem: Missing training data for a digit.

Solution: Run system, review output, approve good images for that digit.

Frequent low confidence warnings

Causes:

  • Camera moved/vibrated
  • Lighting changed
  • Focus drifted
  • Display brightness changed

Solutions:

  • Check camera mounting
  • Adjust lighting (block reflections)
  • Re-focus camera if needed
  • Restart to re-detect layout

False corruption detection

Problem: Valid digits matched as invalid.

Solution: Review invalid templates, remove overly broad patterns.

Large delta causing unnecessary holds

Problem: Real physiological changes >3 points being held.

Solution:

  • Increase delta threshold (e.g., to 5)
  • Or adjust hindsight validation logic
  • Consider different thresholds for SpO2 vs HR

Values not posting to Home Assistant

Checks:

  1. Home Assistant URL correct in config.yaml?
  2. Network connectivity?
  3. Sensors created in HA configuration?
  4. Check log file for POST errors
  5. Are values actually changing? (change-based posting only)

Performance Characteristics

Typical timing per frame (v3.48):

  • Frame acquisition: 150-250ms (camera waiting time)
  • Preprocessing: 8-12ms
  • Frame scaling: 3-5ms
  • SpO2 recognition: 9-12ms
  • HR recognition: 9-12ms
  • Validation: 0-1ms
  • File I/O: 0ms (normal), 40-50ms (failures only)
  • Home Assistant POST: 5ms (when values change)
  • Total: 25-35ms per frame (processing only, excluding camera wait)
  • Timestamp check (every 10th frame): ~22ms (optimized with reused OCR client)

Debug mode (with /debug flag):

  • Additional file I/O: +40ms (saves all digit images)
  • Total: 65-75ms per frame

With every 4th frame processing at 15fps:

  • ~3.75 readings per second maximum
  • Actual posting rate: varies (only when values change)
  • CPU usage: Low (single-threaded, mostly idle)
  • Timestamp validation: Every ~2.5 seconds (~22ms each)

Version History

v3.57 (Current - True Raw Frame Saving)

  • CRITICAL FIX: Raw frames are now truly raw (before preprocessing)
    • Problem in v3.56: "Raw" frames were saved AFTER preprocessing (cropped, rotated, thresholded)
    • User confusion: Testing with "raw" frames didn't work as expected - frames were already processed
    • Fix: Clone and threshold raw frame IMMEDIATELY after acquisition, before ANY preprocessing
    • New flow:
      1. Acquire colored frame from camera (portrait, with timestamp)
      2. Clone frame → grayscale → threshold at 240 → save as rawThresholded
      3. Continue with original colored frame → preprocess (crop + rotate) → threshold → OCR
      4. On error: Save rawThresholded to raw_frames/
    • Result: Saved "raw" frames are truly raw (portrait orientation, timestamp visible, not cropped/rotated)
    • File size: ~100KB PNG (thresholded) vs 2MB colored JPG
    • Benefits:
      • Can test raw frames through full preprocessing pipeline
      • See exactly what came from camera before any processing
      • Small file size (thresholded) vs huge colored images
    • Impact: When testing with raw_frames/raw_*.png, they'll go through crop+rotate+threshold like normal frames

v3.56 (Threshold AFTER Preprocessing)

  • CRITICAL FIX: Moved threshold to AFTER preprocessing: Timestamp overlay was being thresholded!
    • Problem in v3.54-v3.55: Threshold happened BEFORE preprocessing
      • Colored frame → threshold → binary
      • Timestamp extracted from binary frame (appeared as B&W)
      • Preprocessed binary frame (cropped B&W timestamp)
    • Correct flow now:
      1. Acquire colored frame
      2. Extract timestamp from colored frame (stays colored!)
      3. Preprocess colored frame (crop timestamp + rotate)
      4. Threshold preprocessed frame at 240 → binary
      5. Layout detection on binary
      6. OCR on binary
    • Why this is correct:
      • Timestamp area is removed BEFORE thresholding (stays colored)
      • Only the actual display area (after crop/rotate) gets thresholded
      • Single threshold at 240, happens once, at the right time
    • Impact: Saved raw frames now show colored timestamp overlay (as expected)

v3.55 (Fixed Double Threshold Bug)

  • CRITICAL FIX: Removed duplicate threshold in saveThresholdedFrame(): Frame was being thresholded TWICE
    • Problem: saveThresholdedFrame() was doing grayscale conversion + threshold at 128
    • But: rawFrame parameter is ALREADY thresholded binary (from line 313 in pulseox-monitor.go)
    • Result: Frames were being thresholded twice - once at 240, then again at 128
    • Fix: saveThresholdedFrame() now just saves the frame directly (no conversion, no threshold)
    • Impact: Overlay/timestamp area should look consistent now
    • This was the "extra" threshold causing B&W overlay to look different
  • Confirmed ONLY one threshold operation now:
    • pulseox-monitor.go line 292: Threshold at 240 (the ONLY threshold)
    • layout_detection.go: No threshold (works on binary)
    • ocr.go: No threshold in recognizeDisplayArea() (works on binary)
    • processor.go: saveThresholdedFrame() just saves (no threshold)
  • Note: ocr.go still has OLD deprecated recognizeDisplay() function with thresholds, but it's never called

v3.54 (Single Threshold Architecture - Had Double Threshold Bug)

  • Attempted to threshold once at 240 after frame acquisition
  • Removed internal thresholds from layout_detection.go and ocr.go
  • BUT missed duplicate threshold in saveThresholdedFrame() → fixed in v3.55

v3.53 (CPU Optimization - Thread Limiting)

  • Removed duplicate ProcessedCount increment bug
  • Limited OpenCV threads to 1 for minimal overhead

v3.52 (CPU Optimization Complete)

  • Cleaned up failed GPU acceleration attempts: Removed AVFoundation backend code
    • Investigation findings: GPU hardware acceleration for RTSP streams is not feasible with current OpenCV setup
      • AVFoundation doesn't support network streams (RTSP), only local camera devices
      • OpenCV's ffmpeg backend has VideoToolbox compiled in but doesn't auto-activate for RTSP
      • Would require either: (1) patching OpenCV's videoio plugin in C++, (2) complete rewrite with different library, or (3) local transcoding proxy
    • Final optimization achieved: CPU reduced from 50% to ~30% (40% reduction)
      • Single-threaded OpenCV (gocv.SetNumThreads(1)) eliminated thread overhead
      • Homebrew's OpenCV has AVFoundation/VideoToolbox support but only for local devices
      • ~15% CPU for RTSP H.264 decoding (software) + ~15% CPU for OCR processing
    • Conclusion: 30% CPU is acceptable for 24/7 monitoring, fan noise significantly reduced

v3.51 (Attempted GPU Acceleration - Reverted)

  • Attempted AVFoundation backend: Failed for RTSP streams (network streams not supported)

v3.50 (Reduced CPU Overhead)

  • Added OpenCV thread limiting: Dramatically reduced CPU usage and thread count
    • Problem: OpenCV was creating 40 threads (one per CPU core) for small operations
    • Impact: Excessive context switching, 65%+ CPU usage for simple 280x200px image operations
    • Fix: Limited OpenCV to 1 thread with gocv.SetNumThreads(1) at startup
    • Rationale:
      • Our images are tiny (280x200px per display)
      • We process sequentially, not in parallel
      • Processing time <20ms - faster than thread spawning overhead
      • Single-threaded = zero thread management overhead
    • Expected result:
      • Thread count drops from ~40 to ~5-8 (only Go runtime threads)
      • CPU usage drops from 65% to <20%
      • No performance loss (processing still <20ms per frame)
    • Startup message: "🔧 OpenCV threads limited to 1 (single-threaded for minimal overhead)"
    • Testing: v3.50 with 2 threads showed 23 threads/48% CPU, so moved to 1 thread for further optimization

v3.49 (Fixed Frame Counter Bug)

  • CRITICAL FIX: Removed duplicate ProcessedCount increment: Fixed skipped frame numbers
    • Problem: state.ProcessedCount++ was being called twice:
      1. In main loop (pulseox-monitor.go) - for every frame acquired
      2. In processor.go - when values changed and HASS posted
    • Result: Every time values changed, the counter was incremented twice
    • Symptom: Frame numbers would skip (e.g., #61, #62, skip #63, #64...)
    • Pattern: Always skipped the frame number 2 positions after a HASS post
    • Fix: Removed the duplicate increment in processor.go
    • Now: Frame numbers increment consistently: #61, #62, #63, #64...

v3.48 (Major Performance Improvements)

  • CRITICAL: Moved digit image saves to DEBUG_MODE only: Massive performance improvement
    • Problem: OCR was saving 8 PNG files per frame (4 per display × 2 displays)
      • checks.png visualization
      • digit1.png, digit2.png, digit3.png (individual digits)
      • full.png (labeled composite)
    • Impact: ~5ms per PNG write × 8 files = 40ms of unnecessary file I/O per frame
    • Fix: Wrapped all digit image saves in if DEBUG_MODE checks
    • Result: OCR now takes ~9-12ms total (just template matching, no file I/O)
    • When needed: Run with /debug flag to generate all digit images
  • Only save boxes.jpg on failures: Reduced file I/O for successful frames
    • Previously: Saved layout visualization (boxes.jpg) every time values changed (~51ms)
    • Now: Only saves boxes.jpg when OCR fails (corruption, unrecognized, low confidence)
    • Successful frames: No extra file I/O beyond what's needed
  • Performance comparison:
    • Before: Total ~80ms (Prep 10ms + Scale 4ms + OCR 20ms + FileIO 40ms + Valid 1ms + HASS 5ms)
    • After: Total ~30ms (Prep 10ms + Scale 4ms + OCR 10ms + FileIO 0ms + Valid 1ms + HASS 5ms)
    • ~60% faster processing for normal operation
  • Debug workflow unchanged: /debug flag still generates all images as before

v3.47 (Timing Instrumentation)

  • Added comprehensive timing measurements with /timing flag: Enables detailed performance analysis
    • New /timing command line flag: Activates timing mode (./pulseox-monitor /timing)
    • Horizontal timing table: Shows timing breakdown for each frame
      • Frame | Acquire | Prep | Scale | OCR_SpO2 | OCR_HR | Valid | FileIO | HASS | Total
    • Header printed every 20 frames: Prevents scroll-back issues
    • Timing measurements:
      • Acquire: Frame acquisition from RTSP source - WAITING TIME (not processing)
      • Prep: Preprocessing (crop timestamp + rotate 90°)
      • Scale: Frame scaling to normalized width (if needed)
      • OCR_SpO2: SpO2 display recognition (template matching)
      • OCR_HR: HR display recognition (template matching)
      • Valid: Validation checks (corruption, confidence, stability) - accumulated across all checks
      • FileIO: Saving review images and debug files - accumulated across all file operations
      • HASS: Home Assistant POST request (only when values change)
      • Total: Wall-clock processing time (Prep + Scale + OCR + Valid + FileIO + logging overhead)
    • Note: Acquire is separate (camera waiting time). Total may not equal sum due to logging/overhead.
    • Purpose: Identify performance bottlenecks (e.g., excessive file I/O)
    • Implementation:
      • Added TimingData struct in types.go
      • Added TIMING_MODE global flag
      • Added printTimingTable() function in helpers.go
      • Added timing measurements throughout processFrame() and processFrames()
      • Timing data passed through entire processing pipeline
    • Example output:
      Frame | Acquire | Prep | Scale | OCR_SpO2 | OCR_HR | Valid | FileIO | HASS | Total
      ------|---------|------|-------|----------|--------|-------|--------|------|-------
      #123  |   2ms   |  6ms |  0ms  |    4ms   |   4ms  |  2ms  |  38ms  |  5ms |  61ms
      #124  |   2ms   |  6ms |  0ms  |    4ms   |   4ms  |  1ms  |  39ms  |  5ms |  61ms
      
    • Benefits:
      • Quickly identify slow operations
      • Track performance trends over time
      • Validate optimization efforts
      • Debug unexpected delays

v3.46 (Fixed "1" Detection + Raw Frame Improvements)

  • CRITICAL FIX: Lowered "1" digit detection threshold from 90% to 85%
    • Problem: "1" digits with 87-89% confidence were being rejected, causing misrecognition
    • Example: "71" was being cut incorrectly because the "1" wasn't detected (87.1% < 90% threshold)
    • Result: Wrong width used (100px instead of 72px), throwing off all cut positions
    • Fix: Threshold lowered to 85% in hasOneAt() function (ocr.go:196)
    • Impact: More reliable detection of "1" digits, especially with slight variations in brightness/focus
  • Auto-enable DEBUG_MODE for file processing
    • When testing with a file (e.g., ./pulseox-monitor raw_frames/thresh_*.png), DEBUG_MODE automatically enabled
    • Eliminates need to add /debug flag manually for single frame testing
    • Implementation: pulseox-monitor.go:65-66 in runSingleFrameMode()
  • Added comprehensive debug output for digit detection
    • Shows display width at start of detection
    • For each digit (3→2→1):
      • Check position being tested
      • hasOneAt() extraction region and confidence score
      • Whether detected as "1" or not
      • Width being used (72px vs 100px)
      • Running total width
      • Final CUT position calculated
    • Region extraction coordinates and widths for all three digits
    • Makes debugging cut position issues trivial - can see exact logic flow
  • Fixed raw frame saving to be truly raw
    • Previous: Frames were saved AFTER preprocessing (rotated, cropped)
    • Now: Frames saved BEFORE any processing (portrait, with timestamp, untouched)
    • Processing applied: Only thresholded (grayscale → binary) to save space as PNG
    • Benefit: Can test raw frames with full preprocessing pipeline
    • Implementation: Clone frame immediately after reading, pass to processFrame()
  • Fixed duplicate logging in test mode
    • Problem: Messages with Both target were printed twice in test mode
    • Cause: globalLogger was set to os.Stdout (same as console output)
    • Fix: Set globalLogger = nil in test mode (pulseox-monitor.go:71)
    • Result: Clean, non-duplicated output when testing with files

v3.45 (Added Processing Time Measurements)

  • Added comprehensive timing measurements: Now shows how long each frame takes to process
    • Frame processing time: Displayed in console output: SpO2=97%, HR=76 bpm [45ms]
    • Frame acquisition interval: Logged to file showing time since last frame acquired
    • Purpose: Understand actual performance vs camera lag vs processing bottlenecks
    • What's measured:
      • Frame acquisition interval (target: 250ms for 4 fps)
      • Total processing time per frame (OCR + validation + file I/O)
      • Logged for both changed values (console) and unchanged values (file only)
    • Output examples:
      • Console: SpO2=97%, HR=76 bpm [45ms] (when value changes)
      • Log file: Frame #123: SpO2=97%, HR=76 bpm (no change) - processed in 42ms
      • Log file: [TIMING] Frame acquired: +251ms since last (target: 250ms)
    • Benefits:
      • Identify if processing time is exceeding 250ms target
      • See actual frame rate vs target 4 fps
      • Distinguish between camera timestamp lag and processing delays
      • Debug performance issues with concrete data
    • This addresses confusion about camera timestamp lag vs actual processing performance

v3.44 (Save Thresholded PNGs, Not Colored JPGs)

  • Changed error frame format: Now saves thresholded black/white PNG instead of colored JPG
    • Rationale: Thresholded frames show exactly what the OCR sees, making debugging much more effective
    • Previous: Saved raw colored camera frames that required manual processing to debug
    • Now: Saves thresholded (binary) PNG showing the exact image fed to template matching
    • Processing: Grayscale conversion → threshold at 128 → save as PNG
    • Filename format: raw_frames/thresh_YYYYMMDD-NNNNN.png
    • Saved on:
      • Corruption detection (invalid template match)
      • Unrecognized digits (negative values)
      • Low confidence (first attempt)
      • Low confidence after retry
    • Benefits:
      • Can replay frames directly through OCR pipeline
      • See exactly what template matching is working with
      • Identify missing templates or threshold issues immediately
      • No need to manually preprocess frames for debugging
      • PNG format (lossless) vs JPG (lossy with compression artifacts)
    • Storage: ~50-100KB per frame vs 200-400KB for colored JPG

v3.43 (Fixed Timing + Removed Raw Frames)

  • Fixed frame acquisition timing: 250ms timer now starts from frame ACQUISITION, not processing completion
    • Previous behavior: Waited 250ms after processing finished (variable frame rate)
    • New behavior: Waits 250ms from when frame was acquired (consistent 4 fps)
    • Implementation: Set lastProcessedTime = now BEFORE returning frame, not after processing
    • Example: If processing takes 50ms, next frame acquired at exactly 250ms from previous acquisition
    • Result: Guaranteed exactly 4 fps regardless of processing time variations
  • Removed raw frame saving: System no longer saves unprocessed camera frames
    • Rationale: Thresholded digit images (step4) are already saved for all failures
    • What's saved on errors:
      • review/f{N}_spo2_digit1.png, digit2, digit3 (thresholded crops)
      • review/f{N}_hr_digit1.png, digit2, digit3 (thresholded crops)
      • review/f{N}_spo2_full.png, hr_full.png (labeled displays)
      • review/f{N}_spo2_checks.png, hr_checks.png (visualization)
    • Removed:
      • raw_frames/ directory no longer created
      • rawFrame parameter removed from processFrame()
      • rawFrame cloning removed from processFrames() loop
    • Benefits: Reduced memory usage, simpler code, less disk I/O
    • Thresholded images are more useful for debugging than raw camera frames

v3.42 (Time-Based Frame Processing)

  • Changed from skip-based to time-based frame processing: More reliable and precise timing
    • Previous approach: Process every 4th frame (variable timing depending on stream fps)
    • New approach: Process frames at exactly 4 fps (250ms minimum interval)
    • Implementation:
      • Track lastProcessedTime timestamp
      • Only process frame if 250ms has elapsed since last processed frame
      • Guarantees consistent 4 fps processing rate regardless of stream fps
    • Benefits:
      • Predictable, consistent processing rate
      • Works correctly even if stream fps varies
      • Simpler to understand and configure
    • Updated NewRTSPSource(): Removed skipCount parameter, now uses hardcoded 250ms interval
    • Console message: "📊 Processing frames at 4 fps (250ms minimum interval)"
  • Confirmed: All step debug images only saved in DEBUG_MODE
    • step1_original.png (only with /debug flag)
    • step2_decolorized.png (only with /debug flag)
    • step3_grayscale.png (only with /debug flag)
    • step4_thresholded.png (only with /debug flag)

v3.41 (Critical Performance Fix)

  • CRITICAL: Fixed excessive debug image writes causing high resource usage
    • Problem: Saving 8 debug images PER FRAME (4 per display x 2 displays)
    • Impact: At 3.75 fps = ~30 images/second = massive disk I/O and CPU usage
    • Images being saved on every frame:
      • step1_original.png (colored extraction)
      • step2_decolorized.png (after alarm mode decolorization)
      • step3_grayscale.png (after grayscale conversion)
      • step4_thresholded.png (after binary threshold)
    • Fix: Only save these debug images when DEBUG_MODE is enabled
    • Result: Normal operation now only saves essential files (digit crops, checks, full)
    • To enable debug images: Run with ./pulseox-monitor /debug flag
    • Performance improvement: ~70% reduction in disk I/O, significant CPU savings
    • This was a leftover from debugging alarm mode decolorization that should never have been in production

v3.40 (Actionable Layout Detection Errors)

  • Enhanced layout detection error diagnostics: Now saves debug visualization when "Only 1 digit display found" error occurs
    • Problem: Error message was not actionable - couldn't see what boxes were detected or why they failed
    • Fix: Save timestamped debug image showing all center boxes with labels
    • Debug visualization shows:
      • All boxes crossing 50% line (center region)
      • Green boxes = qualified as digit displays (height >= 110px)
      • Cyan boxes = rejected (too short)
      • Box labels showing dimensions: "#0: H=XXpx OK" or "#1: H=XXpx TOO SHORT"
      • Red line showing 50% Y position
      • Summary at top: "Found: X center boxes, Y digit displays (need 2)"
    • Enhanced console error output:
      • Shows count of center boxes found
      • Lists each box with dimensions and qualification status
      • "✓ QUALIFIED" for boxes that passed height requirement
      • "✗ TOO SHORT (< 110px)" for boxes that were rejected
      • Path to debug image: test_output/layout_error_YYYYMMDD_HHMMSS.jpg
    • Makes debugging layout failures much faster - can immediately see what system detected

v3.39 (Fixed Frame Scaling Order)

  • CRITICAL FIX: Frame scaling now happens in correct order
    • Problem: Boxes were calculated on original frame, then frame was scaled, causing coordinate mismatch
    • Root cause: Layout detection calculated boxes on unscaled frame, but processFrame() scaled the frame later
    • Fix: Layout detection now scales frame IMMEDIATELY after calculating scale factor (Step 4)
    • Process flow:
      1. detectScreenLayoutAreas() calculates scale from bounding box width
      2. Scales frame to 860px width using gocv.Resize()
      3. Scales ALL coordinate arrays (bounding box, allBoxes, significantBoxes)
      4. ALL subsequent processing (contour detection, box calculation) uses scaled frame AND scaled coordinates
      5. Returns boxes in scaled coordinates + scale factor
      6. processFrame() scales every frame using same scale factor
      7. Boxes and frames now match perfectly
    • Code changes:
      • layout_detection.go: Added frame scaling at Step 4 (after scale calculation)
      • layout_detection.go: Scale all coordinate arrays (scaledBoundingBox, scaledAllBoxes, scaledSignificantBoxes)
      • All debug visualizations now use scaled frame (step 5-15 images)
      • Removed tryAcquireLayout() wrapper function (unnecessary abstraction)
      • pulseox-monitor.go: Calls detectScreenLayoutAreas() directly, stores layout+scale in state
      • processor.go: Keeps scaling logic in processFrame() (needed for every frame)
    • Benefits:
      • Boxes calculated on 860px frame match frames that are scaled to 860px
      • No more coordinate mismatches between layout detection and OCR processing
      • Simpler code flow - no useless wrapper functions
      • Alarm box trimming (Steps 8-10) preserved and working correctly
    • Verified: Bounding box width log will show ~860px after scaling (was variable before)
  • BUG FIX: Review entries now use correct frame numbers
    • Problem: Successful readings used state.FrameCount (always 0) instead of actual frameNum
    • Result: All images referenced f0_* instead of actual frame numbers
    • Fix: Changed 3 instances in processor.go to use frameNum parameter
    • Affected: Invalid physiological values, successful readings, unstable readings
    • Failed readings (corruption, unrecognized, low confidence) were already correct
  • OPTIMIZATION: Only save images for frames added to review
    • Problem: Every processed frame saved images, even if values didn't change
    • Result: review/ folder filled with unused images from StatusNoChange frames
    • Fix: Delete digit images when values haven't changed (StatusNoChange)
    • Saves disk space and makes review folder only contain relevant frames
    • Deleted files: f*_spo2_checks.png, f*_spo2_digit[1-3].png, f*_spo2_full.png, f*hr*.png

v3.38 (Dynamic Frame Normalization)

  • Implemented dynamic frame scaling based on bounding box width: Handles unstable device position
    • Problem: Pulse oximeter lays loose on bed, causing frame width to vary
    • Solution: Measure bounding box width in Step 4 of layout detection
    • Process:
      1. Calculate scale factor: scale = 860 / boundingBoxWidth
      2. Log bounding box width and scale factor to console
      3. Store scale factor in ProcessingState.LockedScale
      4. Apply scaling to every frame before OCR processing
    • Changes:
      • detectScreenLayoutAreas() now returns (*ScreenLayout, float64, error) - includes scale factor
      • tryAcquireLayout() captures and stores scale in state
      • processFrame() applies scaling if state.LockedScale != 1.0
      • Logs: "📊 Bounding box width: Xpx, Scale factor: Y (target: 860px)"
    • Benefits:
      • Consistent 860px normalized width regardless of camera position/zoom
      • Layout coordinates remain stable across frames
      • OCR accuracy maintained even when device moves

v3.37 (Fixed Layout Detection Height Threshold)

  • Fixed digit display height threshold: Changed from > 120px to >= 110px
    • Problem: Layout detection was failing because digit displays were exactly 110-120px tall
    • Previous code: Required height > 120px (too restrictive)
    • Fixed code: Requires height >= 110px (matches MIN_BOX_HEIGHT constant)
    • Added debug logging: Shows dimensions of each center box and whether it's accepted/rejected
    • Helps diagnose layout detection failures
    • Consistent with original design intent (MIN_BOX_HEIGHT = 110)

v3.36 (Unified Logging)

  • Consolidated all logging to use logMessage() function: Replaced all fmt.Printf(), fmt.Println(), and inconsistent logging patterns
    • ocr.go: Converted template loading messages to use logMessage()
      • Template warnings → Console + Warning level
      • Template loading success → Console + Info level
    • frame_source.go: Converted frame processing info message
      • "Processing every Nth frame" → Console + Info level
    • normalize.go: Converted all debug output messages
      • Decolorization progress → LogFile + Debug level
      • Debug file saves → LogFile + Debug/Info level
    • All files now use consistent logging pattern:
      • Startup/shutdown → Console + Info
      • Main readings → Both + Info
      • Warnings/errors → Both + Warning/Error
      • Debug messages → LogFile + Debug (respects DEBUG_MODE flag)
      • Progress indicators → Console + Info
    • Benefits:
      • Single source of truth for logging behavior
      • Consistent timestamp formatting across all logs
      • Proper level-based filtering
      • Clean separation of console vs file logging
      • Debug mode properly controls debug output

v3.35 (Logging System Implementation)

  • Implemented logMessage() function in helpers.go: New unified logging system
    • Target: Console, LogFile, Both
    • Level: Debug, Info, Warning, Error
    • Single-letter prefixes (D, I, W, E) for clean output
    • Automatic DEBUG_MODE filtering for debug messages
    • Timestamp formatting: [HH:MM:SS.mmm]
    • Eliminates need for checking if logger != nil throughout codebase

v3.34 (Correct Order + Debug Images)

  • Fixed order of layout detection: Center region FIRST, digit displays SECOND
    • Correct process order:
      1. Find ALL contours with RetrievalList
      2. Collect all significant boxes (>50px width or height)
      3. Calculate bounding box from ALL boxes (not just digit displays)
      4. Find 50% Y line from overall bounding box
      5. Filter to boxes crossing 50% line (center region)
      6. From center boxes, identify digit displays (height >= 110px)
      7. Find alarm icons (center boxes extending beyond digit displays)
      8. Trim digit displays based on alarm icons
      9. Split by center X
      10. Find rightmost X in each half (using box center)
      11. Create fixed-width boxes (280px)
    • Debug images at every step saved to test_output/:
      • layout_step1_grayscale.jpg
      • layout_step2_threshold.jpg
      • layout_step3_eroded.jpg
      • layout_step4_all_boxes.jpg (all boxes >50px in green)
      • layout_step5_center_boxes.jpg (boxes crossing 50% line in cyan, red line)
      • layout_step6_digit_displays.jpg (digit displays in magenta)
      • layout_step7_alarm_icons.jpg (alarm icons in yellow, if any found)
      • layout_step8_trimmed_displays.jpg (trimmed displays in green)
      • layout_step9_final_boxes.jpg (final SpO2/HR boxes with labels)
    • Key fix: Bounding box calculated from ALL contours, not just digit displays
    • This prevents missing digit displays that are outside early bounding calculations

v3.33 (Contour Debug Logic + Center Region - WRONG ORDER)

  • Implemented contour_debug.go logic combined with center region approach: As instructed
    • Exact implementation of contour_debug.go detection:
      1. Find ALL contours with RetrievalList
      2. First pass: Identify digit displays (height >= 110px)
      3. Second pass: Find alarm icons (boxes extending beyond digit displays on right)
      4. Trim digit displays: Find leftmost alarm icon edge that overlaps, set as right boundary
    • Combined with center region logic: 5. Calculate bounding box from trimmed digit displays 6. Find 50% Y line 7. Filter to displays crossing 50% line (center region) 8. Split by center X 9. Find rightmost X in each half (using box center to determine which half) 10. Create fixed-width boxes (CUT_WIDTH = 280px)
    • Key differences from v3.32:
      • Works with TRIMMED digit displays (after alarm icon trimming)
      • Alarm icons properly detected and used to trim display boundaries
      • Follows exact contour_debug.go algorithm before applying center logic
    • Diagnostic output shows trimming actions when alarm icons detected

v3.32 (Fixed Layout Detection - WRONG APPROACH)

  • Reverted to proven layout detection logic: Fixed incorrect box detection from v3.30
    • Problem in v3.30/v3.31: Alarm icon detection was finding wrong boxes (yellow alarm square instead of digit displays)
    • Fix: Restored working center region approach without alarm icon complications
    • Process:
      1. Find all significant contours (>50px width or height)
      2. Calculate overall bounding box
      3. Filter to boxes crossing 50% Y line with height >= 110px (digit displays)
      4. Create center region from filtered boxes
      5. Split by center X
      6. Find rightmost X in each half using box center to determine which half
      7. Create fixed-width boxes (CUT_WIDTH = 280px)
    • Key insight: Use box center (not box edge) to determine left vs right half
    • Alarm icon trimming removed for now - adds complexity without clear benefit
    • Focuses on reliable detection of actual digit display boxes
    • Debug images from v3.31 retained for troubleshooting

v3.31 (Decolorization After Extraction)

  • Moved decolorization to after display area extraction: Layout detection now works on original colored frames
    • Problem: Layout detection was trying to threshold colored alarm backgrounds, failing to find displays
    • Fix: Layout detection now works on original colored frame (grayscale + threshold)
    • Decolorization moved: Now happens in recognizeDisplayArea() AFTER extracting the small display regions
    • Process flow:
      1. Preprocess (crop + rotate) - original colored frame
      2. Detect layout on colored frame (grayscale → threshold → find contours)
      3. Extract display areas (280x200px) - still colored
      4. Decolorize extracted regions (only ~56K pixels per display)
      5. Continue with OCR (grayscale → threshold → template matching)
    • Debug images added: Each processing step now saves debug images:
      • f{N}_{display}_step1_original.png - Original extracted colored region
      • f{N}_{display}_step2_decolorized.png - After decolorization
      • f{N}_{display}_step3_grayscale.png - After grayscale conversion
      • f{N}_{display}_step4_thresholded.png - After thresholding (ready for OCR)
    • Allows visual verification of each processing step
    • Decolorization only affects OCR, not layout detection

v3.30 (Simplified Layout Detection)

  • Simplified layout detection with alarm icon trimming: Combined best of both approaches
    • Uses RetrievalList to find ALL contours (including nested ones like alarm icons)
    • Single contour pass: No more complicated center region extraction and re-detection
    • Alarm icon detection: Finds boxes that extend beyond digit displays (clock icons, alarm indicators)
    • Smart trimming: Trims BOTH SpO2 and HR displays if alarm icons found, otherwise uses regular boxes
    • Process:
      1. Find all contours with RetrievalList
      2. Identify digit displays (height >= 110px)
      3. Find alarm icons (boxes extending beyond digit displays on the right)
      4. Trim digit displays based on leftmost alarm icon edge (if any)
      5. Split displays into left (SpO2) and right (HR) halves using center X
      6. Find rightmost X in each half for display boundaries
      7. Create fixed-width boxes (CUT_WIDTH = 280px)
    • Much simpler and more reliable than previous double-contour approach
    • Combines center region logic with direct contour approach from contour_debug.go

v3.29 (Correct Decolorization Scope)

  • Fixed decolorization scope: Removed wasteful full-frame decolorization
    • Problem in v3.28: Decolorizing entire 1390x2736 frame (3.8M pixels) took ~2000ms
    • Reality: Decolorization only needed for small display regions during OCR
    • Correct flow:
      1. Preprocess (crop + rotate) - original colored frame
      2. Detect layout on original frame (works fine with colored backgrounds)
      3. Extract display areas (280x200px each)
      4. Decolorize ONLY display regions in recognizeDisplayArea() (~100K pixels)
      5. Run OCR on decolorized regions
    • Decolorization already exists in ocr.go at the right place (line ~281)
    • No need for full-frame decolorization - layout detection works on colored frames
    • Performance: ~40x faster (decolorizing 100K pixels vs 3.8M pixels)

v3.28 (Decolorization Timer - REVERTED)

  • Added decolorization timing: Timer measures how long decolorizeFullFrame() takes
    • Timer added after preprocessing in main processing loop
    • Displays: "[TIMER] Decolorization took: Xms"
    • Restored decolorization to pipeline: Was defined but not being called!
    • Decolorization now runs between preprocessing and layout detection
    • Enables alarm mode support (colored backgrounds → black/white)
    • Allows measurement and optimization of decolorization performance
    • Frame lifecycle: raw → preprocess → decolorize → detect/process

v3.27 (Save ACTUAL Raw Frames)

  • CRITICAL FIX: Raw frames are now truly raw (unprocessed)
    • Problem: raw_frames/raw_*.png files were being saved AFTER preprocessing (rotation + crop)
    • Result: When testing with "raw" frames, they were rotated AGAIN, making displays vertical instead of horizontal
    • Fix: Clone raw frame immediately after reading from source (line 233)
    • Flow now:
      1. Read frame from source (portrait orientation)
      2. Clone raw frame → keep for potential error saving
      3. Preprocess: crop timestamp + rotate 90° CW
      4. Decolorize
      5. Process
      6. IF error (corruption, low confidence, unrecognized) → save RAW clone
      7. Close raw frame clone
    • Raw frames saved in processor.go at error points using the rawFrame parameter
    • Raw frames are now truly unprocessed - can be used for testing by running through full pipeline
    • Fixes issue where testing with raw frames would double-rotate them

v3.26 (Fixed Redundant Decolorization)

  • CRITICAL FIX: Eliminated 4x redundant decolorization calls: Now decolorizes ONCE at top level only
    • Problem: Frame was being decolorized 4 times for every layout detection:
      1. In detectScreenWidth() (2668x1458 - full frame)
      2. In saveNormalizationDebug() (2668x1458 - same frame again!)
      3. In detectScreenLayoutAreas() (2130x1164 - center region)
      4. In detectScreenLayoutAreas() center region (859x518 - smaller region)
    • Each call took 2-3 seconds - total 8-12 seconds wasted per frame!
    • Fix: Removed ALL decolorization calls from normalize.go and layout_detection.go
    • Now: Single decolorization in pulseox-monitor.go at line 259-267
    • Result: Frame processing time reduced from ~12 seconds to ~3 seconds
    • Decolorized frame passed to all layout functions - they expect pre-decolorized input
    • Simplified decolorizeFullFrame() output - removed progress indicators (10%, 20%, etc.)
    • Added comments documenting that functions expect already-decolorized frames

v3.25 (Aggressive Debug Output)

  • Added extensive debug image saves: Every processing step now saves an image to test_output/
    • step1_original.jpg - Frame after preprocessing (rotated)
    • step2_decolorized.jpg - After decolorizeFullFrame() (should be black/white)
    • step3_grayscale.jpg - After grayscale conversion
    • step4_thresholded.jpg - After binary threshold
  • Added decolorization progress tracking:
    • Shows progress every 10% of rows processed
    • Shows final statistics: white pixel count/percentage, black pixel count/percentage
    • Helps diagnose if decolorization is working correctly
  • Purpose: Debug why alarm mode decolorization appears to not be working

v3.24 (Fixed Layout Decolorization)

  • Fixed missing decolorization in layout detection: Layout detection was still using colored frames
    • Added decolorizeFullFrame() call in detectScreenLayoutAreas() function
    • Previously: Only detectScreenWidth() was decolorizing, but detectScreenLayoutAreas() was not
    • Problem: Layout detection was converting colored frame directly to grayscale
    • Result: Alarm mode colored backgrounds prevented contour detection
    • Fix: Decolorize BEFORE grayscale conversion in both contour detection passes:
      1. Initial contour detection (full frame)
      2. Center region contour detection
    • Now: Layout detection works correctly in both normal and alarm modes
    • This was the missing piece preventing alarm mode from working end-to-end

v3.23 (Higher White Threshold)

  • Increased white pixel detection threshold: Changed from 200 to 240 for more selective white detection
    • Updated in both decolorizeFullFrame() (normalize.go) and decolorizeAlarmMode() (ocr.go)
    • Previous threshold (200): Too permissive, caught near-white/gray pixels
    • New threshold (240): More selective, only truly white pixels (near 255) are preserved
    • Logic: If RGB all > 240 → keep as white (255), else → convert to black (0)
    • Rationale: Display digits are pure white (255), not gray, so higher threshold is more accurate
    • Better separation between white digits/borders and colored alarm backgrounds

v3.22 (Decolorize Before Layout)

  • Moved decolorization before layout detection: Alarm mode now works end-to-end
    • New decolorizeFullFrame() function in normalize.go
    • Applied BEFORE layout detection (in detectScreenWidth())
    • Previously: Decolorization only happened during OCR (too late)
    • Now: Decolorization happens first, then layout detection sees clean white-on-black displays
    • Fixes issue where colored alarm backgrounds prevented layout detection
    • Layout detection flow:
      1. Decolorize full frame (yellow/cyan → black, white → white)
      2. Convert to grayscale
      3. Threshold at 170
      4. Find contours
    • Same decolorization also applied in saveNormalizationDebug() for accurate debug visualization
    • Result: System can now detect and recognize displays in both normal and alarm modes

v3.21 (Preprocessing Debug)

  • Added debug output for preprocessing: Prints frame dimensions before and after preprocessing
    • Shows: "Before preprocess: WxH" and "After preprocess: WxH"
    • Helps verify that rotation is actually happening
    • Expected: Portrait (e.g., 1080x1920) → Landscape (e.g., 1920x1080)
    • Temporary diagnostic output to verify preprocessFrame() is working

v3.20 (Unified Preprocessing)

  • Verified unified preprocessing path: Both RTSP streaming and file test modes use identical preprocessing
    • All frames go through preprocessFrame() function:
      1. Crop timestamp (top 68 pixels)
      2. Rotate 90° clockwise (portrait → landscape)
    • Applied in unified processFrames() loop (line 250)
    • Works for both RTSPSource and FileSource
    • Important: Raw frames saved to raw_frames/ are saved AFTER preprocessing
    • Therefore: Raw frames are already rotated and ready for testing
    • If testing with unrotated frames, they'll go through preprocessing automatically
    • Ensures consistent behavior regardless of frame source
    • Layout detection expects landscape orientation (SpO2 and HR side-by-side horizontally)

v3.19 (Alarm Mode Support)

  • Added alarm mode decolorization: System now recognizes digits even when pulse oximeter is in alarm state
    • New decolorizeAlarmMode() function converts colored alarm backgrounds to black
    • Alarm mode displays:
      • SpO2: Yellow/orange background with white digits
      • HR: Cyan/blue background with white digits
    • Decolorization logic:
      • Pixels where ALL RGB values > 200 → kept as white (digits)
      • All other pixels → converted to black (colored backgrounds, borders)
    • Applied before grayscale conversion in OCR pipeline
    • Enables normal thresholding and template matching to work correctly
    • Previously: Colored backgrounds would confuse thresholding, causing OCR failures
    • Now: Alarm mode digits recognized just as reliably as normal mode
    • Example: "90" in yellow box → converted to "90" in black/white → OCR succeeds

v3.18 (Zero Validation)

  • Added '0' digit validation: Prevents false '0' recognition by checking for characteristic center hole
    • New validateZero() function checks for a hole at 50% Y (center)
    • Scans horizontally from 30%-70% X looking for black pixels (hole)
    • Wider horizontal range than '8' validation (30-70% vs 40-50%) since '0' hole is larger
    • If validation fails, marks digit as invalid (-1) triggering retry/re-detection
    • Applied to all three digit positions (digit1, digit2, digit3)
    • Complements existing '8' validation (two holes at 30% and 70% Y)
    • Helps prevent misrecognition of corrupted/partial digits or digits with missing centers as '0'
    • Example log output:
      validateZero: center hole @ y=50 (50%) x=15-35 (true)
      

v3.17 (Enhanced Cleanup)

  • Enhanced directory cleanup on startup: Now cleans all output directories in streaming mode
    • Previously: Only review/ was cleaned on startup
    • Now: Cleans review/, raw_frames/, and test_output/ in streaming mode
    • Single-frame test mode: No cleanup (preserves test output)
    • Progress display: Shows each directory being cleaned with checkmarks
    • Example output:
      🗑️  Cleaning output directories...
         - review/... ✓
         - raw_frames/... ✓
         - test_output/... ✓
      
    • Rationale: raw_frames/ accumulates failed recognition frames over time; clean slate each run
    • Rationale: test_output/ contains debug visualizations that should be fresh per session

v3.16 (Append-Only HTML Review)

  • Implemented append-only HTML generation: Review HTML is now written incrementally instead of all at once on shutdown
    • HTML header written once at startup via initReviewHTML()
    • Each frame entry appended immediately via appendReviewEntry() as it's processed
    • Footer written on shutdown via closeReviewHTML()
    • Browser handles missing closing tags gracefully, allowing live refresh
    • Benefits:
      • Live updates: refresh browser anytime to see latest frames
      • No regeneration overhead: just one file write per frame
      • Can run for hours without memory/performance issues
      • No data loss if process crashes (all processed frames already written)
    • All review entry points now consistently call appendReviewEntry():
      • Corruption detection
      • Unrecognized digits
      • Low confidence (both first attempt and retry)
      • Invalid physiological values
      • Successful readings
      • Unstable readings (held for validation)
    • Fixed duplicate appendReviewEntry() calls in corruption handler (was calling 3 times)

v3.15 (Fixed Eight Validation)

  • Fixed validateEight() probe positions: Corrected hole detection based on actual measurements
    • Issue: Was probing at 33% and 67% which hit the white digit and middle bar
    • Fix: Now checks horizontal line segments at 30% and 70% height
    • Scans X from 40% to 50% of width (10% range centered around middle)
    • Looks for ANY black pixel (< 128) in that range = hole detected
    • Much more robust than single pixel check - tolerates slight variations
    • Holes are BLACK (empty space), digit is WHITE in thresholded image
    • Prevents false rejection of valid "8" digits

v3.14 (Corruption Debug Output)

  • Added debug output for corruption cycles: Helps diagnose repeated corruption detection
    • When corruption is detected, saves debug files to test_output/ with timestamp
    • Saves normalized frame: corruption_YYYYMMDD_HHMMSS_frameN_normalized.png
    • Saves layout visualization: corruption_YYYYMMDD_HHMMSS_frameN_layout.jpg
    • Also includes all digit crops from review/ directory (already generated by OCR)
    • Allows offline analysis when system gets stuck in corruption → re-detect → corruption cycles
    • Log message: "💾 Debug saved: YYYYMMDD_HHMMSS"

v3.13 (RTSP Reconnect Fix + Eight Validation)

  • Fixed segmentation fault on RTSP reconnect failure: Prevents crash when stream reconnection fails
    • Added null check before reading from stream
    • Sets stream to nil and marks source as closed after failed reconnect
    • Returns shouldContinue=false to stop processing instead of attempting to read from null stream
    • Prevents SIGSEGV when VideoCapture is null
  • Added '8' digit validation: Prevents false '8' recognition by checking for two characteristic holes
    • New validateEight() function probes two specific regions (at 1/3 and 2/3 height)
    • Checks if both regions are at least 60% black (indicating holes)
    • If validation fails, marks digit as invalid (-1) triggering retry/re-detection
    • Applied to all three digit positions (digit1, digit2, digit3)
    • Helps prevent misrecognition of corrupted/partial digits as '8'

v3.12 (Physiological Value Validation)

  • Added minimum value threshold for Home Assistant posting: Prevents posting physiologically impossible values
    • Validation rule: Both SpO2 and HR must be >= 40 to post to Home Assistant
    • Values below 40 are logged to file and added to review, but NOT posted to HASS
    • Console displays: "⚠️ Values below 40 - not posting to HASS"
    • Rationale: SpO2 or HR below 40 are not physiologically possible in living patients
    • Prevents false alarms and invalid data in Home Assistant database
    • Common scenarios triggering this:
      • Pulse oximeter disconnected (shows dashes, may be misread as low numbers)
      • Recognition errors resulting in single-digit values (0-9)
      • Display corruption being partially recognized

v3.11 (Reduced Raw Frame Storage)

  • Raw frames now only saved on recognition failures: Significantly reduces disk space usage
    • Previously: Every processed frame saved to raw_frames/ (all successes and failures)
    • Now: Only saves frames when recognition fails (corruption, unrecognized, low confidence)
    • Successful readings and unstable readings (held for validation) no longer saved
    • Failure cases where frames are saved:
      • Corrupted frames (invalid pattern matched)
      • Unrecognized digits (negative values)
      • Low confidence readings (both first attempt and retry)
    • Typical result: ~90% reduction in raw frame storage for stable monitoring
    • Failed frames still available for debugging and training data collection

v3.10 (Corruption Frame Number)

  • Added frame number to corruption log message: Easier debugging of false corruption detections
    • Message now shows: [CORRUPTION] Frame #123 - Digit = -1 detected: SpO2(9,5) HR(7,-1) - skipping frame
    • Helps identify which specific frame triggered invalid pattern match
    • Useful for reviewing frames in HTML output and approving/removing invalid templates

v3.9 (Simplified Digit Detection Arrays)

  • Simplified digit detection to use only arrays: Removed all intermediate variables
    • D[1,2,3] array stores cut positions for digit extraction
    • digitIsOne[1,2,3] array stores detection results (is it a "1"?)
    • Eliminated intermediate variables like digit1IsOne, digit2IsOne, digit3IsOne, middleStart, rightStart
    • Code now directly uses array notation throughout: digitIsOne[2], D[3], etc.
    • Cleaner, more maintainable code with single source of truth
  • Fixed "13" recognition bug: D1 position now correctly calculated from accumulated widths
    • Loop accumulates widths: digit3 → digit2 → digit1
    • D1 check position = w - digit3Width - digit2Width - 5
    • Previously was checking at wrong position when only 2 digits present

v3.8 (Code Cleanup & Digit Detection Loop)

  • Added helpers.go: New file with logf() helper function
    • Eliminates repetitive if logger != nil checks throughout codebase
    • Makes logging calls cleaner and more maintainable
    • Applied across processor.go, validators.go, and ocr.go
  • Refactored digit detection to use loop: Replaced repetitive digit checking code with elegant loop
    • Uses arrays for digit names, detection results, and widths
    • Accumulates width progressively (digit3 → digit2 → digit1)
    • More maintainable and easier to understand
    • Eliminates code duplication

v3.7 (3-Digit Value Calculation Fix)

  • Fixed 3-digit value calculation bug: Removed score requirement for digit1 detection
    • Issue: When digit1IsOne detected a "1" digit, code also required template match score > 50%
    • Problem: digit1 region includes empty padding, causing low template match scores
    • Result: 106 was calculated as 6, 103 as 3, 104 as 4
    • Fix: Trust hasOneAt() detection alone - if digit1IsOne is true, use 100 + num2*10 + num3
    • The hasOneAt() function already confirms there's a "1" pattern at the correct position

v3.6 (Millisecond Timestamps)

  • Added millisecond precision to timestamps: Changed timestamp format from 15:04:05 to 15:04:05.000
    • Provides more precise timing for frame processing and reading logs
    • Helpful for debugging timing issues and frame processing delays
    • Format: [03:03:17.245] SpO2=93%, HR=85 bpm

v3.5 (Frame Rate + Timestamp Validation)

  • Frame rate adjustment: Changed from 30fps to 15fps camera
    • Adjusted processing from every 6th frame to every 4th frame
    • Processing rate: ~3.75fps (was ~5fps)
    • Camera was struggling with 30fps, reduced to 15fps
  • Timestamp validation: Added OCR check of camera timestamp
    • Validates timestamp every 10 processed frames (~2.5 seconds)
    • Warns if camera time differs by >3 seconds from server
    • Uses optimized gosseract library (~22ms per check)
    • Optimizations: Downscale 50%, grayscale, Otsu threshold, invert, PNG encode
    • Reused OCR client for major speed boost (was 120ms, now 22ms warm)
    • Silent when drift is within ±3 seconds
    • Helps detect frozen/lagging streams
  • New file: timestamp_ocr.go - Timestamp extraction and validation

v3.4 (Low Confidence Counter)

  • Added low confidence counter: Tracks and displays frequency of low confidence frames
    • New LowConfidenceCount field in ProcessingState
    • Console displays count when it increments: "Low confidence (#X)"
    • Helps identify camera/lighting issues needing attention
    • Counter appears for both first detection and retry failures

v3.3 (Corruption Detection Fix)

  • Fixed corruption detection: Direct check prevents invalid frames from reaching stability
    • Changed from validateCorruption() wrapper to direct IsCorrupted() call
    • ANY digit = -1 (invalid template match) now immediately returns StatusCorrupted
    • Prevents invalid/corrupted frames from being marked as "unstable"
    • Ensures clean separation: corruption → silent skip, low confidence → retry

v3.2 (Review & Signal Handling)

  • Fixed Ctrl+C handling: Simplified signal handling to prevent segfaults
    • Removed stream.Close() from signal handler (was causing SIGSEGV)
    • Stream cleanup now happens naturally on program exit
    • Sets closed flag only, checks happen between frame reads
  • Enhanced review.html: Failed frames now show all digit images
    • Corruption, low confidence, and unrecognized frames display digit crops
    • Allows reviewing and approving digits from failed recognitions
    • Red background distinguishes failed frames
  • Silenced corruption detection: Invalid patterns work quietly
    • Corrupted frames no longer appear in console or HTML
    • Only logged to file for debugging purposes
    • Keeps review focused on actionable items

v3.1 (Optimizations)

  • Removed FPS display: Cleaned up console output
    • No more "Stream FPS: X (avg over Y frames)" messages
    • Simpler, focused console showing only readings and warnings
  • Negative value handling: Unrecognized digits now trigger immediate retry
    • SpO2 or HR < 0 (OCR returned -1) treated as low confidence
    • Triggers immediate next frame read + layout re-detection
    • Prevents false "unstable" warnings for blurry/unrecognized frames
  • Frame rate adjustment: Changed from every 4th to every 6th frame
    • ~5 fps processing rate from 30fps stream
    • Better balance for 30fps camera (was tuned for 15fps)

v3.0 (Refactored)

  • Modular architecture: Split monolithic pulse-monitor.go into focused modules
    • processor.go: Frame processing orchestration
    • validators.go: Validation logic
    • frame_source.go: Abstracted sources (RTSP/file)
    • types.go: Shared data structures
    • normalize.go: Frame normalization
  • Smart escalation strategy: Progressive failure handling
    • 1st failure: Try next frame (0s wait)
    • 2nd failure: Re-detect layout (0s wait)
    • 3rd failure: Wait 10s
    • 4th failure: Wait 30s
    • 5th+ failures: Wait 60s
    • Success resets counter
  • Interruptible sleep: Ctrl+C works immediately during waits
    • Checks source.IsActive() every second
    • No more forced 60s waits before shutdown
  • ConsecutiveFailures counter: Tracks problems across processing
    • Incremented on layout failures, corruption, low confidence
    • Reset on successful processing
    • Enables intelligent escalation
  • See V3_CHANGES.md for complete migration guide

v2.35

  • Robust layout detection under varying light conditions
    • Changed Step 3 logic: instead of picking "2 largest boxes", find rightmost X in each half
    • Splits detection area at X-center, finds max(rightX) for left half (SpO2) and right half (HR)
    • Handles digit fragmentation: even if "95" breaks into "9" and "5" contours, still finds correct right edge
    • Tracks Y range across all contours in each half for proper bounding
  • Raw frame archiving
    • Saves every processed frame to raw_frames/raw_YYYYMMDD-NNNNN.png (persistent directory)
    • Enables offline testing and replay of detection algorithms
    • Files are the rotated frames (after timestamp crop), exactly as fed to detection pipeline
    • Not cleaned on restart - accumulates for historical analysis

v2.34

  • Every 4th frame processing (improved responsiveness)
  • Simplified corruption handling (no immediate retry)
  • Silent corruption detection (log only)
  • Digit 3 detection offset: -8 pixels
  • Invalid template: 2 patterns

Previous versions

  • v2.33: Added hindsight validation for large deltas
  • v2.32: Implemented low confidence retry with layout re-detection
  • v2.31: Added invalid digit detection
  • Earlier: Various contour detection and template matching improvements

Contact & Context

Development Environment: macOS (local development and testing)
Target Deployment: Ubuntu 22.04 with Docker
Home Assistant Version: (check your setup)
Camera: (specify your RTSP camera model)

Project Location: /Users/johanjongsma/pulse-monitor/


Quick Reference

Start monitoring

cd /Users/johanjongsma/pulse-monitor
./pulseox-monitor

Stop and generate review

Ctrl+C
# Opens review/review.html automatically

Check logs

tail -f pulseox-monitor_*.log

Approve training digit

# Move good digit image to training_digits/
# Naming: {digit}_{variant}.png

Mark as invalid/corruption

# Move bad pattern to training_digits/invalid/
# Naming: descriptive name like "white_blob_2.png"

Document Purpose: This document serves as a comprehensive reference for understanding the project state, architecture, and reasoning. It should enable a new chat session (or developer) to quickly understand what's been built, why decisions were made, and how to continue development.

Last Verified Working: November 17, 2025 (v3.57 - True raw frame saving)