Add Prometheus metrics endpoint to telemetry service #8

Closed
opened 2026-04-09 07:25:18 +00:00 by johan · 2 comments
Owner

The telemetry service currently has health check and raw telemetry endpoints, but lacks a /metrics endpoint for Prometheus scraping.

Requirements:

  1. Add /metrics endpoint that returns Prometheus-format metrics
  2. Metrics to expose:
    • telemetry_requests_total (counter, labeled by pop_id and status)
    • telemetry_request_duration_seconds (histogram)
    • active_connections (gauge)
    • db_query_duration_seconds (histogram for health checks)
  3. Use proper error codes (ERR-TELEMETRY-XXX) for any failures
  4. Follow Cardinal Rule - every error path must be handled
  5. Add tests for the metrics endpoint

Domain: clavis-telemetry

Priority: NORMAL - operational improvement for monitoring

The telemetry service currently has health check and raw telemetry endpoints, but lacks a /metrics endpoint for Prometheus scraping. **Requirements:** 1. Add /metrics endpoint that returns Prometheus-format metrics 2. Metrics to expose: - telemetry_requests_total (counter, labeled by pop_id and status) - telemetry_request_duration_seconds (histogram) - active_connections (gauge) - db_query_duration_seconds (histogram for health checks) 3. Use proper error codes (ERR-TELEMETRY-XXX) for any failures 4. Follow Cardinal Rule - every error path must be handled 5. Add tests for the metrics endpoint **Domain:** clavis-telemetry **Priority:** NORMAL - operational improvement for monitoring
hans was assigned by johan 2026-04-09 07:25:18 +00:00
Collaborator

Fix Complete - Ready for Review

Implemented in PR #12 (branch: hans/fix-8)

Changes:

  • metrics.go: New file with Prometheus-style metrics collectors (184 lines)

    • telemetry_requests_total counter with pop_id and status labels
    • telemetry_request_duration_seconds histogram
    • active_connections gauge
    • db_query_duration_seconds histogram
  • main.go: Added /metrics endpoint routing and instrumentation

    • Records request durations and counts
    • Tracks active connections
    • Records DB query durations in health checks
  • main_test.go: 21 comprehensive tests covering all metrics functionality

Verification:

All 21 tests pass (go test -tags commercial ./...)
Prometheus format verified (text/plain with proper HELP/TYPE annotations)
Error codes follow convention (ERR-TELEMETRY-XXX)
Cardinal Rule: Every error path handled with unique codes
No vault content in telemetry (operations data only)
Thread-safe implementation (mutexes + atomics)

Security Review:

  • No key material in metrics
  • No raw IPs exposed
  • Commercial-only build tag (//go:build commercial)

Ready for review and merge.

## Fix Complete - Ready for Review Implemented in PR #12 (branch: `hans/fix-8`) ### Changes: - **metrics.go**: New file with Prometheus-style metrics collectors (184 lines) - `telemetry_requests_total` counter with pop_id and status labels - `telemetry_request_duration_seconds` histogram - `active_connections` gauge - `db_query_duration_seconds` histogram - **main.go**: Added `/metrics` endpoint routing and instrumentation - Records request durations and counts - Tracks active connections - Records DB query durations in health checks - **main_test.go**: 21 comprehensive tests covering all metrics functionality ### Verification: ✅ All 21 tests pass (`go test -tags commercial ./...`) ✅ Prometheus format verified (text/plain with proper HELP/TYPE annotations) ✅ Error codes follow convention (ERR-TELEMETRY-XXX) ✅ Cardinal Rule: Every error path handled with unique codes ✅ No vault content in telemetry (operations data only) ✅ Thread-safe implementation (mutexes + atomics) ### Security Review: - No key material in metrics - No raw IPs exposed - Commercial-only build tag (`//go:build commercial`) Ready for review and merge.
Collaborator

Completed by Hans (NOC/Operations). The Prometheus metrics endpoint is fully implemented with all 21 tests passing. See metrics.go, main.go, and main_test.go in clavis/clavis-telemetry/.

Completed by Hans (NOC/Operations). The Prometheus metrics endpoint is fully implemented with all 21 tests passing. See metrics.go, main.go, and main_test.go in clavis/clavis-telemetry/.
hans closed this issue 2026-04-09 07:41:47 +00:00
Sign in to join this conversation.
No description provided.