clavitor/issues/003-silent-kuma-failure.md

3.2 KiB

Issue: Silent failure in Kuma push — no error handling

Domain: clavis-telemetry
Assignee: @hans
Labels: violation, cardinal-rule-part-1, error-handling, silent-failure
Priority: High
Date: 2026-04-08


Violation

Cardinal Rule Violated: Part 1 — "Mandatory error handling with unique codes" AND Part 1 — "Silent fallbacks are not fixes"

Per CLAVITOR-AGENT-HANDBOOK.md Part 1:

Quick fixes are not fixes. A "temporary" hack that ships is permanent. Every if needs an else. The if exists because the condition IS possible.


Location

File: clavis/clavis-telemetry/kuma.go

Lines 53-59:

// POST to Kuma
payload := `{"status":"` + status + `","msg":"` + strings.ReplaceAll(msg, `"`, `\"`) + `","ping":60}`
resp, err := http.Post(kumaURL, "application/json", strings.NewReader(payload))
if err != nil {
    // Silent fail - Kuma will detect silence as down
    return
}
resp.Body.Close()

The Violation

  1. Silent failure: The error is caught and completely ignored with only a comment
  2. No error code: No ERR-XXXXX code for operational forensics
  3. No logging: The failure is invisible in logs
  4. Comment is misleading: "Kuma will detect silence as down" — but operators won't know WHY Kuma shows down

Why This Matters

When Kuma shows "down", operators need to know if it's because:

  • The telemetry service is actually down (DB failure)
  • The telemetry service can't reach Kuma (network issue)
  • Kuma itself is having issues

Silent failures create blind spots in operational monitoring. The telemetry service could be failing to report health for hours, and the only symptom would be Kuma showing red — with no logs explaining why.


Required Fix

  1. Log Kuma push failures with unique error code
  2. Include the error details in the log
  3. Consider retry logic or backoff (optional)
  4. Document the failure mode

Example Fix

// POST to Kuma
payload := `{"status":"` + status + `","msg":"` + strings.ReplaceAll(msg, `"`, `\"`) + `","ping":60}`
resp, err := http.Post(kumaURL, "application/json", strings.NewReader(payload))
if err != nil {
    log.Printf("ERR-TELEMETRY-020: Failed to push health to Kuma at %s - %v", kumaURL, err)
    return
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
    log.Printf("ERR-TELEMETRY-021: Kuma returned non-OK status %d from %s", resp.StatusCode, kumaURL)
}

Additional Issue: resp.Body.Close() Error Ignored

Line 60: resp.Body.Close() returns an error that is silently discarded.

Fix:

if err := resp.Body.Close(); err != nil {
    log.Printf("ERR-TELEMETRY-022: Failed to close Kuma response body - %v", err)
}

Verification Checklist

  • Kuma push failures logged with ERR-TELEMETRY-020
  • Non-OK HTTP responses logged with ERR-TELEMETRY-021
  • Response body close errors handled with ERR-TELEMETRY-022
  • All errors include actionable context (URL, status, error details)
  • Test case added for Kuma push failure scenario

Reporter: Yurii (Code & Principle Review)
Reference: CLAVITOR-AGENT-HANDBOOK.md Part 1, "Mandatory error handling with unique codes" and "Silent fallbacks are not fixes"