113 lines
3.2 KiB
Markdown
113 lines
3.2 KiB
Markdown
# Issue: Silent failure in Kuma push — no error handling
|
|
|
|
**Domain:** clavis-telemetry
|
|
**Assignee:** @hans
|
|
**Labels:** `violation`, `cardinal-rule-part-1`, `error-handling`, `silent-failure`
|
|
**Priority:** High
|
|
**Date:** 2026-04-08
|
|
|
|
---
|
|
|
|
## Violation
|
|
|
|
**Cardinal Rule Violated:** Part 1 — "Mandatory error handling with unique codes" AND Part 1 — "Silent fallbacks are not fixes"
|
|
|
|
Per CLAVITOR-AGENT-HANDBOOK.md Part 1:
|
|
> Quick fixes are not fixes. A "temporary" hack that ships is permanent.
|
|
> Every `if` needs an `else`. The `if` exists because the condition IS possible.
|
|
|
|
---
|
|
|
|
## Location
|
|
|
|
File: `clavis/clavis-telemetry/kuma.go`
|
|
|
|
Lines 53-59:
|
|
|
|
```go
|
|
// POST to Kuma
|
|
payload := `{"status":"` + status + `","msg":"` + strings.ReplaceAll(msg, `"`, `\"`) + `","ping":60}`
|
|
resp, err := http.Post(kumaURL, "application/json", strings.NewReader(payload))
|
|
if err != nil {
|
|
// Silent fail - Kuma will detect silence as down
|
|
return
|
|
}
|
|
resp.Body.Close()
|
|
```
|
|
|
|
---
|
|
|
|
## The Violation
|
|
|
|
1. **Silent failure:** The error is caught and completely ignored with only a comment
|
|
2. **No error code:** No `ERR-XXXXX` code for operational forensics
|
|
3. **No logging:** The failure is invisible in logs
|
|
4. **Comment is misleading:** "Kuma will detect silence as down" — but operators won't know WHY Kuma shows down
|
|
|
|
---
|
|
|
|
## Why This Matters
|
|
|
|
When Kuma shows "down", operators need to know if it's because:
|
|
- The telemetry service is actually down (DB failure)
|
|
- The telemetry service can't reach Kuma (network issue)
|
|
- Kuma itself is having issues
|
|
|
|
Silent failures create blind spots in operational monitoring. The telemetry service could be failing to report health for hours, and the only symptom would be Kuma showing red — with no logs explaining why.
|
|
|
|
---
|
|
|
|
## Required Fix
|
|
|
|
1. Log Kuma push failures with unique error code
|
|
2. Include the error details in the log
|
|
3. Consider retry logic or backoff (optional)
|
|
4. Document the failure mode
|
|
|
|
---
|
|
|
|
## Example Fix
|
|
|
|
```go
|
|
// POST to Kuma
|
|
payload := `{"status":"` + status + `","msg":"` + strings.ReplaceAll(msg, `"`, `\"`) + `","ping":60}`
|
|
resp, err := http.Post(kumaURL, "application/json", strings.NewReader(payload))
|
|
if err != nil {
|
|
log.Printf("ERR-TELEMETRY-020: Failed to push health to Kuma at %s - %v", kumaURL, err)
|
|
return
|
|
}
|
|
defer resp.Body.Close()
|
|
|
|
if resp.StatusCode != http.StatusOK {
|
|
log.Printf("ERR-TELEMETRY-021: Kuma returned non-OK status %d from %s", resp.StatusCode, kumaURL)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Additional Issue: resp.Body.Close() Error Ignored
|
|
|
|
Line 60: `resp.Body.Close()` returns an error that is silently discarded.
|
|
|
|
Fix:
|
|
```go
|
|
if err := resp.Body.Close(); err != nil {
|
|
log.Printf("ERR-TELEMETRY-022: Failed to close Kuma response body - %v", err)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Verification Checklist
|
|
|
|
- [ ] Kuma push failures logged with `ERR-TELEMETRY-020`
|
|
- [ ] Non-OK HTTP responses logged with `ERR-TELEMETRY-021`
|
|
- [ ] Response body close errors handled with `ERR-TELEMETRY-022`
|
|
- [ ] All errors include actionable context (URL, status, error details)
|
|
- [ ] Test case added for Kuma push failure scenario
|
|
|
|
---
|
|
|
|
**Reporter:** Yurii (Code & Principle Review)
|
|
**Reference:** CLAVITOR-AGENT-HANDBOOK.md Part 1, "Mandatory error handling with unique codes" and "Silent fallbacks are not fixes"
|