clavitor/issues/001-missing-error-codes.md

3.2 KiB

Issue: Missing unique error codes in clavis-telemetry

Domain: clavis-telemetry
Assignee: @hans
Labels: violation, cardinal-rule-part-1, error-handling
Priority: Medium
Date: 2026-04-08


Violation

Cardinal Rule Violated: Part 1 — "Mandatory error handling with unique codes"

Per CLAVITOR-AGENT-HANDBOOK.md Part 1:

Mandatory error handling with unique codes:

  • Every if needs an else. The if exists because the condition IS possible
  • Use unique error codes: ToLog("ERR-12345: L3 unavailable in decrypt")
  • When your "impossible" case triggers in production, you need to know exactly which assumption failed and where.

Error messages that actually help:

Every error message shown to a user must be:

  1. Uniquely recognizable — include an error code: ERR-12345: ...
  2. Actionable — the user must know what to do next
  3. Routed to the actor who can resolve it

Location

File: clavis/clavis-telemetry/main.go

Lines with violations:

Line Current Code Violation
41 log.Fatalf("Failed to open operations.db: %v", err) No unique error code
47 log.Fatalf("Failed to load CA chain for mTLS: %v", err) No unique error code
228 log.Printf("Invalid certificate from %s: %v", popID, err) No unique error code
337 log.Printf("SPAN EXTEND node=%s gap=%ds...") No unique error code
342-351 log.Printf("OUTAGE SPAN node=%s...") No unique error code
367-370 log.Printf("OUTAGE SPAN... alerting disabled") No unique error code
383 log.Printf("OUTAGE SPAN ntfy error creating request: %v", err) No unique error code
395 log.Printf("OUTAGE SPAN ntfy error sending alert: %v", err) No unique error code
398 log.Printf("OUTAGE SPAN ntfy alert sent for node=%s", nodeID) No unique error code

File: clavis/clavis-telemetry/kuma.go

Line Current Code Violation
56-58 Silent fail on Kuma push error Missing error handling entirely

Required Fix

  1. Assign unique error codes for each error path (e.g., ERR-TELEMETRY-001 through ERR-TELEMETRY-020)
  2. Format: ERR-TELEMETRY-XXX: <actionable message>
  3. Include error codes in:
    • Fatal logs (database/CA loading failures)
    • Certificate validation failures
    • External alerting failures (ntfy)
    • Kuma push failures (currently silent)

Example Fix

// Before:
log.Fatalf("Failed to open operations.db: %v", err)

// After:
log.Fatalf("ERR-TELEMETRY-001: Failed to open operations.db at %s - %v. Check permissions and disk space.", dbPath, err)

Verification Checklist

  • All log.Fatalf calls include ERR-TELEMETRY-XXX codes
  • All log.Printf error logs include ERR-TELEMETRY-XXX codes
  • Kuma push errors are no longer silent (line 56-58 kuma.go)
  • Certificate validation failures include error codes
  • External alert failures (ntfy) include error codes
  • Test cases verify error codes appear in output

Reporter: Yurii (Code & Principle Review)
Reference: CLAVITOR-AGENT-HANDBOOK.md Part 1, "Mandatory error handling with unique codes"