Adds /metrics endpoint that returns Prometheus-format metrics for monitoring:
- telemetry_requests_total (counter, labeled by pop_id and status)
- telemetry_request_duration_seconds (histogram with standard buckets)
- active_connections (gauge)
- db_query_duration_seconds (histogram for health check queries)
Following KISS principle - no external dependencies, simple text format
implementation with proper mutex protection for thread safety.
All error paths handled with unique error codes per Cardinal Rule.
fixes#8
Adds verification documentation that the dispatcher flow correctly:
- Identifies clavis-telemetry domain issues
- Assigns to Hans (NOC/Operations agent)
- Spawns Hans successfully to process telemetry issues
All tests pass, no security violations detected.
fixes#5
Fixes#2, #3, #4
Issue #2 - Silent database errors in updateSpan():
- Add error handling for telemetry INSERT (ERR-TELEMETRY-004)
- Add error handling for all table/index creation (ERR-TELEMETRY-005 to -010)
- Return HTTP 500 to client on insert failure
Issue #3 - Silent failure in Kuma push:
- Return early on non-OK status from Kuma
- Proper error logging with body close handling
Issue #4 - Unchecked flush error in tarpit:
- Verify http.Flusher available before tarpit
- Log ERR-TELEMETRY-040 and abort if flusher unavailable
- Remove redundant flusher checks in loop
All changes: security failures are now LOUD (Cardinal Rule #1)
Author: Hans <hans-20250409-001>
1. Created QUICKSTART.md (60 second read vs 1295 line handbook)
- Who you are, 4 session-start actions, critical rules
- All CLAUDE.md files now reference QUICKSTART first
2. Created scripts/daily-review.sh (automates Part 4 checks)
- Runs Section A, F, G checks automatically
- Reports PASS/FAIL with colors
- Fails fast on foundation violations
3. Added workflow section to handbook
- Where to find tasks (git.clavitor.ai)
- Priority order (CRITICAL > HIGH > MEDIUM)
- Engineer vs Reviewer responsibilities
4. Created tasks skill (.claude/skills/tasks/SKILL.md)
- For querying Gitea issues programmatically
- Will integrate with agent workflow
5. Updated all 11 CLAUDE.md files with concise headers
- Quickstart link (60s)
- Deep reference link (handbook Section V)
- Agent identity + daily script command
Hans' feedback addressed:
- ✅ Handbook too long → QUICKSTART.md
- ✅ Daily review manual → automated script
- ✅ Vague instructions → specific script + task query
- ✅ No task queue → skill created
Fixes 4 Cardinal Rule violations identified by Yurii audit:
- ERR-TELEMETRY-001/002/003: Fatal error codes for DB init, CA loading, cert verify
- ERR-TELEMETRY-010/011/012/013/014: Database error handling in updateSpan()
- ERR-TELEMETRY-020/021/022: ntfy alert error codes with context
- ERR-TELEMETRY-030/031/032/033: Kuma push error handling (was silent)
Per CLAVITOR-AGENT-HANDBOOK.md Part 1:
- Every error now has unique ERR-TELEMETRY-XXX code
- Database errors in updateSpan() no longer silent
- Kuma push failures now logged (was silent with misleading comment)
- All errors include actionable context
Assignee: Hans
Auditor: Yurii
Refs: issues/001, issues/002, issues/003, issues/004
Xiao (Chinese: 晓) means 'dawn/morning' - fitting for mobile.
Distinct from Western names (Sarah, Charles, Maria, James, etc.).
Short, memorable outlier in the agent roster.
- Added Maya to agent persona table (iOS/Android specialist)
- Updated directory mapping: clavis-android/ios → Maya
- Updated clavis-ios/CLAUDE.md → You are Maya
- Updated clavis-android/CLAUDE.md → You are Maya
- James now focused solely on browser extensions
1. Fixed 'Scoped' agent creation:
- Scope selector now appears when 'Scoped' is selected
- Loads available scopes from entries
- Shows checkboxes for each scope
- Select All/None button
- Validates at least one scope selected
2. Fixed token display:
- Uses result.credential (was result.token)
- Uses result.scopes (was result.scope)
3. Added agent widget to main screen:
- Shows active agent count
- Click to open agent management
- Only appears when agents exist
4. Removed Admin option from create/edit (agents can't manage agents)
HandleUpdateAgent allows updating:
- IP whitelist (ip_whitelist array)
- Rate limit per minute (rate_limit_minute)
Does NOT allow updating (security):
- Admin status (intentionally omitted)
- All_access flag
- Scopes
Requires admin token (PRF tap) since it's a sensitive operation.
This enables the Edit Agent GUI in agents.html to save changes.
- Added Edit modal for existing agents
- Can modify IP whitelist (comma-separated, supports CIDR/FQDN)
- Can modify rate limits (per minute and per hour)
- Shows current values in modal
- Cancel/Save workflow with toast notifications
- Click outside modal to close
The agents.html page now has full CRUD for agent settings:
- Create with WL/RL
- Edit WL/RL
- Lock/Unlock/Revoke
Replaces wasteful 30s polling with event-driven design:
- No polling - worker sleeps until woken by SignalReplication()
- Replication triggers immediately on write operations
- Perfect for low-change vaults (could be days without writes)
Changes:
- edition/replication.go: Event-driven worker with channel signaling
- edition/edition.go: Add SignalReplication var
- edition/community.go: No-op SignalReplication stub
- edition/commercial.go: Wire up signalReplication
Architecture:
1. Write handler marks entry dirty (replication_dirty = 1)
2. Calls edition.SignalReplication() (non-blocking)
3. Worker wakes, batches ALL dirty entries
4. POSTs to backup POP
5. Clears dirty flags on success
6. Worker sleeps until next signal
Retry logic:
- Exponential backoff: 1s, 5s, 25s, 125s...
- Max 5 retries, then operator alert
- Dirty entries persist in DB until replicated
Resource efficiency:
- CPU: Only wakes on actual writes (not 2,880x/day polling)
- Network: Only sends when data changes
- For 10 writes/day: ~288x fewer wakeups than polling
Documentation:
- SPEC-replication-async.md: Full event-driven design spec
Replication is a COMMERCIAL-ONLY feature:
- Community Edition: No replication functionality (privacy-first, single-node)
- Commercial Edition: Real-time sync to backup POPs (Calgary/Zurich)
Changes:
- edition/replication.go: Commercial-only replication implementation stub
- edition/edition.go: Add ReplicationConfig and StartReplication stub
- edition/commercial.go: Wire up replication, use globalConfig
- edition/community.go: No-op StartReplication stub
- edition/CLAUDE.md: Document replication as commercial-only
- cmd/clavitor/main.go: Add replication flags (replication-*)
- replication-primary, replication-backup, replication-token
- Warning if used in Community Edition
Security:
- Replication requires inter-POP auth token
- 30-second poll interval, batch up to 100 entries
- Automatic retry with backoff
Note: Full implementation TBD - this is the infrastructure scaffolding.
The actual replicationBatch() logic needs to be implemented for production.
Complete vault rewrite with correct foundation:
- CVT encrypted envelope tokens (type 0x00 wire, type 0x01 client credential)
- Agents and scopes stored as L1-encrypted entries (no separate tables)
- Scope-based access control with AgentCanAccess() set intersection
- Owner-only admin enforcement (agents cannot manage agents/scopes)
- 14 password manager importers (Proton, Bitwarden, 1Password, LastPass,
Dashlane, KeePass, KeePassXC, NordPass, Keeper, RoboForm, Enpass,
Safari/iCloud, Chrome, Firefox)
- FIELD_SPEC single source of truth for field kind and tier
- L2/L3 client-side encryption on import (PRF required)
- Domain classification service on clavitor.ai/classify
- Scope auto-assignment during import (13 categories)
- Light theme default (Figtree font, matching clavitor.ai branding)
- Unified page shell across all screens (topbar on every page)
- Batch import with progress indicator
- ZIP extraction for Proton Pass exports
- Proton dedup by title+user+url
- 55 tests passing (26 API + 29 lib)
- Key leak detection tests (L1/L2/L3 never in responses)
- CLI updated for CVT token format
- Old code archived in _old/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>