diff --git a/CLAVITOR-AGENT-HANDBOOK.md b/CLAVITOR-AGENT-HANDBOOK.md new file mode 100644 index 0000000..4df1eed --- /dev/null +++ b/CLAVITOR-AGENT-HANDBOOK.md @@ -0,0 +1,1515 @@ +# Clavitor Agent Handbook + +**The contract every agent working on Clavitor accepts before touching code.** + +--- +**Document Version:** 1.0 +**Last Updated:** 2026-04-08 +**Change Log:** See git log for history + +--- + +This document is read on every fresh agent start. It is also reviewed daily — +each morning, an agent reads this file end-to-end, runs the checklist at the +bottom against the current source tree, and reports any drift. Drift gets +fixed before any new feature work. + +The bar is not "ship something". The bar is "build a foundation that does not +need to be patched later". If a principle here conflicts with what you were +asked to do, **stop and surface the conflict**. Do not work around it. + + +--- + +## Section I — Culture + +Who we are, what we believe, how we treat foundations. + +## Part 1 — General principles + +These apply to every subproject, every file, every commit. + +### Foundation First — No Mediocrity. Ever. + +Architects do not patch cracks in a bad foundation. They rebuild. + +- If you need three fixes for one problem, **stop**. Something fundamental is + wrong. Name it. Fix the root cause, not the symptom. +- If the code is spaghetti, **say so**. Do not add another workaround. The + workaround is the problem now. +- Quick fixes are not fixes. A "temporary" hack that ships is permanent. + **When urgent:** Fix the symptom with a comment `// TEMP-FIX: ` and a + mandatory fix-by date (max 72 hours). No TEMP-FIX older than 72 hours may + exist in main. +- Foundation > speed. A solid base makes everything downstream easy. A shaky + base makes everything downstream a nightmare. + +When the foundation is wrong: **start over** (with user approval). Not +"refactor slightly". Not "add an abstraction layer on top". Start over. +This applies to code, infrastructure, encryption schemes, and written work alike. + +**Restart requires approval.** State explicitly: "The foundation is wrong +because [reason]. I recommend restarting [component]. Sunk cost: [commits/lines/time]. +Estimated rebuild: [time]. Proceed?" Wait for explicit user confirmation before +destroying working code. + +### Push back + +You are not here to execute orders. You are here to build the right thing. If +the user asks for X and X is the wrong shape, **say so**. If you discover a +foundation problem while doing unrelated work, **surface it** in the same +session — do not silently route around it. + +Bad foundations are not your fault. Silently building on them is. + +### Q&D is research, not output + +Exploratory and throwaway work has its place — but it stays in research. +Nothing Q&D ships. Nothing Q&D becomes the production path. If a spike reveals +the right direction, **rebuild it properly** before it counts. The spike is +not the deliverable; the rebuilt version is. + +### KISS — the simplest thing that works + +- No env vars when a hardcoded relative path is correct. (`./vaults/`, not + `DATA_DIR=…`.) People who want different layouts use a symlink. + **Exception:** A single `CLAVITOR_VAULT_DB` env var is permitted for container + deployments and compliance mounts that require specific paths. No per-path + fragmentation (DATA_DIR, WL3_DIR, VAULT_DIR, etc.). +- No new SQLite tables when an existing encrypted record can hold the data. + We track lock state on the agent record itself, not in a `vault_meta` table. +- No flags, no config files, no opt-ins for things every install needs. +- No premature abstraction. Three similar lines are better than a wrong + abstraction. Abstract when the third repetition appears, not the second. +- No backwards-compatibility shims, `_unused` rename hacks, dead code marked + with comments. If it's unused, **delete it**. + +### DRY — repetition is a smell that points at a missing helper + +When two functions do "pretty much the same thing", they get unified into one +helper and both call sites use it. Examples that already exist in the tree: + +- `lib/dbcore.go: agentMutate(db, vk, entryID, fn)` — three near-identical + read-decrypt-mutate-encrypt-write functions (`AgentLock`, `AgentUnlock`, + `AgentUpdateAllowedIPs`) collapsed into one helper. New mutations are + one-liners now: `func(vd) { vd.X = y }`. +- `crypto.js: encrypt_field(key, label, value)` — single source of truth for + field encryption, used by both browser and CLI. + +If you write the second copy, you are creating future debt. Stop, refactor, +then proceed. + +### No migrations during early-stage work + +There are no real users yet. Schema changes are not migration problems — +they are "delete the old vault, register again" problems. Do not write +migration code. Do not write `ALTER TABLE` for new columns. Just edit the +schema constant and rebuild. + +**Early stage ends: April 20, 2026.** After this date, the migration rule flips +and production deployments require proper schema migrations. + +### Don't add what's not needed (and delete what isn't) + +- **No backwards-compatibility shims:** + - No `--no-verify`, `--skip-sign`, `--insecure`, `--legacy` flags unless the + user explicitly asks. + - No `_oldName` field rename hacks "for transition". + - No re-exports of types that moved. + - No `// removed` comments in place of deletion. + - No commented-out code "in case we need it again". + - If you delete it, delete it. `git log` is the rollback path. + +- **Delete dead code (with approval):** + - No orphaned files (not imported, not referenced, not executed). + - No unreachable functions (private, no callers in the codebase). + - No unused exports (public but no external importers). + - No dead CSS classes (not referenced in HTML/JS). + - No commented-out blocks >5 lines (anything that large should be deleted). + - **Permission required:** File deletion requires explicit user approval. + State: "3 files/directories found unused. Recommend deletion: + 1. path/to/file1.html (orphaned, no references) + 2. path/to/file2.go (unreachable function) + 3. path/to/dir/ (empty) + Proceed with deletion?" + - **Verification:** Part 4 Section G checks for unused files/functions. + +- **Mandatory error handling with unique codes:** + - Every `if` needs an `else`. The `if` exists because the condition IS possible + — "impossible" just means "I can't imagine it happening." The else handles + the case where your imagination was wrong. + - Use unique error codes: `ToLog("ERR-12345: L3 unavailable in decrypt")` + - When your "impossible" case triggers in production, you need to know exactly + which assumption failed and where. + +### Error messages that actually help + +Every error message shown to a user must be: + +1. **Uniquely recognizable** — include an error code: `ERR-12345: ...` +2. **Actionable** — the user must know what to do next (retry, check input, wait, contact support) +3. **Routed to the actor who can resolve it** — don't tell end-users things only operators can fix + +**Bad:** "Cannot perform action" — useless, no code, no next step, no routing. + +**Bad:** "Server down" to an end-user — they can't fix it. This creates helplessness and support tickets. + +**Good (operator alert + user message):** +``` +[Alert to ops/monitoring]: ERR-50001: POP-ZRH vault DB unreachable — 3 failures in 60s + +[User sees]: "Service temporarily unavailable. Our team has been alerted + and is working on it. Reference: ERR-50001. No need to open a ticket." +``` + +The user knows: (a) what happened, (b) it's being handled, (c) they don't need to act. +This prevents unnecessary support load while keeping the user informed. +- No validation at trusted internal boundaries. Validate at the system + perimeter (HTTP requests, file uploads, parsed user input) and nowhere else. +- No feature flags for hypothetical futures. Build what's asked, no more. +- Docstrings where understanding requires tracing >2 files or non-local reasoning. + If you had to grep across modules to figure out what a function does, the next + reader will too — document it. Obvious single-file logic needs no comment. + +### Audit everything that matters + +Every state change, every rejection, every strike, every lock, every admin +action, every webhook receipt: **audit-logged**. The audit log is the forensic +tool when something goes wrong. If it's not in the audit log, it didn't happen +and you can't reconstruct it. + +Audit format already in use: `lib.AuditLog(db, &lib.AuditEvent{ Action: ..., Actor: ..., Title: ..., IPAddr: ..., EntryID: ... })`. + +### Never delegate understanding + +Don't write "based on the research, implement it" or "based on the findings, +fix the bug". Those phrases push synthesis onto someone else (or onto a future +agent). Write code that proves you understood: include file paths, line +numbers, and what specifically to change. + +--- + + +### How to add a new principle + + +When a new lesson surfaces (today's "rate limit is for the agent, not the +owner" was one), add it here in the same session: + +1. Pick the right section: General (Part 1), Cardinal (Part 2), or + subproject-specific (Part 3). +2. State the principle in one sentence at the top of the entry. +3. Explain WHY in 1–3 sentences. The reasoning matters more than the rule — + future reviewers need it to apply the principle correctly to new + situations. +4. If the principle is auditable, add a grep check to Part 4. +5. Reference the incident or insight that surfaced the principle (commit + hash, issue number, or session date). + +Principles are additive. Don't delete old ones to make room — they exist +because something went wrong once. If a principle is genuinely obsolete: + +**Deprecation format:** +```markdown +### **DEPRECATED** — Original Principle Name +**Deprecated:** 2026-04-08 +**Reason:** Early stage ended — migration code now required +**Authorized by:** Johan +**Replacement:** See "Schema migrations" in Part 8 (when written) + +[original principle text preserved below...] +``` + +- Deprecated principles remain in place for 90 days, then move to Appendix A +- A deprecated principle's grep checks in Part 4 are commented out with reason +- Deprecation requires explicit user approval (state: "Deprecating principle X + because Y. Authorized by you?") + + +### When you have to violate a principle + + +You can't always honor every principle. Sometimes the principles themselves +conflict. When that happens: + +1. **Surface the conflict to the user.** Don't choose silently. +2. **Pick the principle with the higher consequence.** Security vetos beat + DRY. Foundation rules beat KISS. The Cardinal Rules in Part 2 beat + everything else. +3. **Document the violation in the code itself**, with a comment explaining + what principle was broken, why, and what would have to be true for the + violation to be reversed. +4. **Add a TODO to the daily checklist** so the violation is reviewed + tomorrow and either fixed or formally accepted. + +Example of an acceptable, documented violation: the IP whitelist first-contact +race in `api/middleware.go` — there's a 30-line comment explaining that the +race is theoretically possible, why it cannot be reproduced in practice, and +why fixing it would require complexity disproportionate to the threat. The +violation is acknowledged, scoped, and dated. + +### Escalation and post-hoc discovery + +**User disagreement:** If the user insists on a lower-consequence principle over a higher one +(e.g., speed over security), escalate to project owner (Johan) before proceeding. +Document the override in the PR description. + +**Post-hoc discovery:** If you find a committed violation during daily review: +1. Assess if it's actively dangerous (security veto) — if yes, escalate immediately +2. If not dangerous, add to TODO and address in current sprint +3. Never leave a violation undocumented for >24 hours + +**Equal consequence conflicts:** When two principles of equal tier conflict, prefer the +one that preserves user agency and data integrity. + +**Severe security violations:** For critical security breaches (harvester detected, +token compromise, unauthorized data exfiltration): +- Immediate escalation to Johan +- Consider permanent ban of the agent and/or **all agents running on that model** +- Quarantine affected vault(s) until forensic review complete +- Legal/compliance review if customer data was accessed + +--- + +--- + +## Section II — Security + +The threat model and cardinal rules. Non-negotiable. + + +The threat model for Clavitor is **a malicious skill that subverts an agent** +into harvesting credentials. Not random brute-forcers. Not script kiddies. The +attacker is already inside the trust envelope — they're an authenticated agent +with a real CVT token. Every defense in the system is built around slowing, +detecting, and stopping that attacker. + +### Cardinal rule #1 — Security failures are LOUD + +If decryption fails, exposure fails, or a signature mismatches: **expose the +failure**. Never fall back to plaintext. Never silently continue. Never return +a partial result that omits the failed item. + +**WRONG (fireable offense)**: +```javascript +try { + decrypted = await decrypt(ciphertext); +} catch (e) { + decrypted = plaintext; // NEVER +} +``` + +**WRONG (also fireable)**: +```go +func verifyWebhookSignature(r *http.Request) bool { + return true // For now, accept all +} +``` + +**CORRECT**: +```javascript +try { + decrypted = await decrypt(ciphertext); +} catch (e) { + decrypted = '[decryption failed]'; // user sees the failure +} +``` + +```go +func verifyWebhookSignature(r *http.Request, body []byte) bool { + secret := os.Getenv("PADDLE_WEBHOOK_SECRET") + if secret == "" { + log.Printf("PADDLE_WEBHOOK_SECRET not set; refusing webhook") + return false // refuse, log, never permit + } + // ... real HMAC verification ... +} +``` + +Failures must be **noisy, visible, and blocking** — never silent, hidden, or +permissive. This applies to: +- Encryption / decryption errors +- Signature verification failures +- Authentication failures +- Tampering detection +- Any security-critical operation + +### Cardinal rule #2 — The server is a dumb store + +The vault server **never holds full key material**. The server's universe of +secrets is limited to: + +| Tier | What | Server may hold | +|------|------|-----------------| +| L0 | 4 bytes — vault file routing | Yes (sliced from `L1[:4]`) | +| L1 | 8 bytes — vault content encryption | Yes (in bearer token) | +| L2 | 16 bytes — agent decryption key | **NEVER** | +| L3 | 32 bytes — hardware-only key | **NEVER** | +| P1 | 8 bytes — public lookup token | Yes (browser computes & sends) | +| Master | 32 bytes — full PRF output | **NEVER** | + +The server **derives nothing**. It receives what the browser sends, validates +shapes, and persists. P1 is computed by the browser via HKDF and sent on the +wire — the server does not know how to derive it because it does not have the +master key. There is no `lib.DeriveP1` function; if you find one, delete it. + +The server is allowed to hold L1 because L1 is the bearer token used for +per-entry encryption keys (`DeriveEntryKey`). L1 grants the ability to read +vault content, but **not** the L2/L3 fields: L2 is the agent's own key +(never leaves the C CLI / browser trust anchor), and L3 is a random 32-byte +value the server has never seen in plaintext — it's only present in the WL3 +file, wrapped with the user's PRF. + +### Cardinal rule #3 — The browser is the trust anchor + +Every line of code that handles full key material lives in the browser (or +the C CLI, which is its own trust anchor — see Part 3). The browser: + +- Generates a random 32-byte L3 at vault creation. L3 is pure entropy — not + derived from anything. The PRF's job is to *wrap* L3, not to derive it. +- Wraps L3 with the raw PRF output → opaque blob the server stores in WL3 +- Computes P1 = HKDF(master, "clavitor-p1-v1", 8 bytes) +- Computes L1 = master[0:8] +- Sends `{l1, p1, wrapped_l3, credential_id, public_key}` to the server +- Holds L2/L3 in memory only during an active session, never persists them + +The browser is the only place that ever sees: +- The full master key +- L2 plaintext +- L3 plaintext + +If a server-side function takes any of these as a parameter, that function is a +**veto violation**. Delete it. + +**Exceptionally rare terms:** The strings `master_key`, `L3` (or `l3_plaintext`), +and `P3` should appear almost nowhere in the codebase. If you find yourself +using them, stop. These are browser-only concepts. Their presence outside +`auth.js`, `crypto.js`, or the C CLI is a sign of architectural drift. + +### Cardinal rule #3.5 — Cryptographic hygiene + +**FIPS 140-3 compliance:** All cryptographic operations must be FIPS 140-3 +validated or in the process of validation. This means: +- Use standard library crypto (Go: `crypto/` packages, C: OpenSSL FIPS module) +- No custom cryptographic primitives +- No home-grown key derivation or random number generation + +**Avoid CGO where possible:** CGO (Go's C interoperability) creates build +complexity, deployment friction, and audit surface. Prefer: +- Pure Go implementations for new code +- CGO only when FIPS mandates it (e.g., specific HSM integrations) +- The C CLI (`clavis-cli`) is the only CGO-heavy component by design + +**Compiler optimizations:** Standard `memset` or `clear()` may be optimized +away. Use `runtime.memclrNoHeapPointers` (Go) or `memset_explicit` (C11) to +ensure key material is actually cleared from memory. + +### Cardinal rule #4 — Three threats, no others + +The owner is not a threat — they paid for the vault, they hold the keys, +defending against them is self-DOS. The three threats below are the **primary** +universe Clavitor defends against at the application layer. Infrastructure, +supply chain, and social engineering are handled through operational security +practices outside this codebase. + +#### Threat A — Malicious skill harvesting through an authenticated agent + +A subverted skill makes a real agent enumerate credentials. Attacker is +already inside the trust envelope: valid CVT token, IP whitelist passes, +normal-looking client. + +Defenses: +- Per-agent unique-entries quota (default 3/min, 10/hour). Repeats of the same + credential are free; only distinct credentials count. +- Two-strike lockdown: second hour-limit hit within 2h → agent locked, owner + PRF unlock required. State persists in the encrypted agent record; restart + does not reset. +- **Slow-harvest detection:** An agent accessing >12 distinct credentials in any + 24-hour period triggers an immediate audit alert (regardless of rate limits). + This is a dark-red flag — normal agents need at most a dozen credentials. +- L3 fields require keys the server has never seen (Threat B's defenses + apply here too). +- Owner-only endpoints exempt from global rate limit (agents blocked at the + handler entry — rate-limiting that path defends against nothing). + +When reasoning about a rate limit, ask: *agent or owner?* If owner, it's wrong. + +#### Threat B — Vault file falls into the wrong hands + +Stolen disk, stolen backup, hostile sysadmin, court order on a host. The +attacker has every byte the server has. **Result must be: useless ciphertext.** + +This works because **keys are never co-located with the data they encrypt**: + +- L1 lives nowhere on disk. Derived from the user's WebAuthn PRF; the PRF + requires the physical authenticator. No keyfile, no recovery copy. +- L2/L3 plaintext are absent from the server's universe at compile time + (see the hard veto list and Part 4 → Section A). +- WL3 holds `wrapped_l3` = AES-256-GCM(L3, raw PRF). Without the + authenticator, this is uniformly random ciphertext. +- 32-byte AES-256-GCM keys are not brute-forceable. ~10⁵⁷ years at 10¹² + guesses/sec classically. Grover's quantum algorithm provides a square-root + speedup, reducing effective security to that of a 128-bit classical cipher — + still computationally infeasible (~10²⁹ years). +- Backups use `VACUUM INTO` — same encrypted blobs, same property. + +If you ever put L1, L2, L3, master_key, the raw PRF, or any decrypted field +value into a log line, backup, metric, error message, audit row, or debug +dump: you have broken Threat B. Stop. + +#### Threat C — Agent token falls into the wrong hands + +The CVT **client credential** (type 0x01: L2(16) + agent_id(16) + POP(4), +encrypted with L0) is exfiltrated. The agent stores this locally; at request +time it derives L1 = L2[:8] and mints a wire token to authenticate. + +A stolen credential gives the attacker **the agent's full decryption +capability**, not just authentication: L2 decrypts L2-flagged fields the +agent fetches. This is a complete agent impersonation. They are the agent. + +Defenses: +- **Per-agent IP whitelist.** First contact fills the whitelist with the + agent's source IP; every subsequent request must come from a listed IP, + CIDR, or FQDN. A stolen credential used from anywhere else is refused at + `L1Middleware` before any handler runs. +- Threat A's defenses still apply: unique-entries quota throttles harvest + attempts, two-strike lockdown stops sustained patterns. Refused-IP attempts + are audit-logged for forensic review. +- Owner revokes the agent at any time (admin-token gated, PRF tap). + +**Gap — Sub-vector C2 (credential theft at creation):** When an agent is +enrolled, the browser generates L2 and creates the type-0x01 credential. This +token exists transiently in browser memory before being delivered to the agent. +A malicious browser extension or XSS could exfiltrate it during this window. + +**Status:** Acknowledged. The token exists in memory while the user is logged in +(with L3 in sessionStorage), so side-channel delivery (QR, file+scp) doesn't +eliminate the risk — it just moves it. Full mitigation requires a hardware +attestation flow not implemented in current stage. **Deferred until post-April 20** +as current IP whitelist + rate limits provide sufficient baseline defense. +- L3 fields remain unreachable — L2 does not unlock L3. + +A perfect attacker (same IP, same access cadence, same scope of reads) is +indistinguishable from the legitimate agent until the owner reviews the audit +log. Whitelist + rate limit + audit are layered on purpose — no single +defense carries the full load. + +### Cardinal rule #5 — WL3 is GDPR-out-of-scope by design + +The WL3 corpus (wrapped L3 + lookup metadata) contains **only cryptographic +material**. No customer ID, no name, no email, no slot ID. The link from a +credential to a human lives only in the central admin DB. Delete the central +record → the WL3 file becomes a mathematically-orphan blob with no possible +link to a person. Functionally anonymous. + +This is not a GDPR risk we tolerate. It is a property we get for free because +we **designed the data out** of the WL3 file. + +If you ever add a personal-data field to the WL3 schema, you have just made +the entire WL3 corpus subject to GDPR deletion requests, retention rules, and +audit obligations across every POP that mirrors it. **Do not.** + +### Cardinal rule #6 — Admin/sync traffic is not public + +Anything sensitive between central and POPs runs over **Tailscale**, not +public 443. The admin endpoints listen only on the Tailscale interface. Public +clavitor.ai serves users; the ops control plane is invisible to the internet. + +App-level HMAC on top of Tailscale is defense-in-depth — if a tailnet node is +ever compromised, the per-POP shared secret is the second lock. Both layers +required, neither optional. + +### Cardinal rule #7 — Webhooks are signed or refused + +Every inbound webhook (Paddle, future integrations) verifies a cryptographic +signature against a stored secret. If the secret is missing, refuse. If the +signature is missing, refuse. If the signature mismatches, refuse. If the +timestamp is outside ±5 minutes of now (anti-replay), refuse. If the body has +been touched between read and verify, refuse. + +There is no debug mode that disables this. If you need to develop locally +against a webhook, set `PADDLE_WEBHOOK_SECRET=test-secret` and craft a real +signature with the test secret. The verification path is always on. + +--- + + +--- + +## Section III — Workflow + +How we work together day-to-day. + +### Daily review checklist + + +**Run this every morning. Any check that fails is a foundation alert — fix +before any new feature work.** + +**Methodology evolution:** These checks were historically grep-driven for +reproducibility. With LLM analysis available, the daily reviewer must run a +**full LLM audit of all changed source files** to catch semantic violations, +not just pattern matches. Grep catches the letter; LLM catches the intent. + +**Procedure:** +1. Run the automated tests (F1) +2. Run LLM analysis on all files changed since last review +3. Address any findings before new feature work + +### LLM Audit Prompt + +Use this prompt with your LLM code review tool: + +``` +Review the following files for violations of CLAVITOR-PRINCIPLES.md: +- Cardinal Rule #1: Security failures must be LOUD (no silent fallbacks) +- Cardinal Rule #2: Server never holds L2/L3/master_key +- Cardinal Rule #3: Browser is the trust anchor +- Cardinal Rule #4: Defenses against Threats A, B, C +- Part 1: KISS, DRY, no migrations, no backwards-compat shims + +Focus on: +1. Security: Any key material (L1, L2, L3, master_key, PRF) flowing to logs, + errors, or responses +2. Silent fallbacks: catch blocks that don't expose failures to users +3. DRY violations: Similar logic that should be unified +4. Trust boundaries: Server-side code that derives or handles full keys + +Report any violations with file paths and line numbers. +``` + +### Section A — Server hard vetos + +**A1. The server never receives `master_key` over the wire.** +```bash +#!/bin/bash +# Verify server never sees full master key +matches=$(grep -r "master_key\|MasterKey\|masterKey" clavis-vault/api/ clavis-vault/lib/ --include="*.go" 2>/dev/null | grep -v "_test.go" | wc -l) +if [ "$matches" -gt 0 ]; then + echo "❌ FAIL: master_key handling found on server ($matches matches)" + grep -r "master_key\|MasterKey\|masterKey" clavis-vault/api/ clavis-vault/lib/ --include="*.go" | grep -v "_test.go" + exit 1 +else + echo "✅ PASS: No master_key handling on server" +fi +``` + +**A2. `lib.DeriveP1` does not exist on the server.** +```bash +#!/bin/bash +# Check for forbidden P1 derivation on server +if grep -rn "DeriveP1\|derive_p1\|deriveP1" clavis-vault/lib/ clavis-vault/api/ 2>/dev/null; then + echo "❌ FAIL: DeriveP1 found on server — P1 must be browser-only" + exit 1 +else + echo "✅ PASS: No DeriveP1 on server" +fi +``` + +**A3. L2-handling CVT functions do not exist on the server.** +```bash +#!/bin/bash +# Check for forbidden L2 credential handling +if grep -rn "MintCredential\|ParseCredential\|CredentialToWire" \ + clavis-vault/api/ clavis-vault/lib/ 2>/dev/null; then + echo "❌ FAIL: L2 credential functions found on server" + exit 1 +else + echo "✅ PASS: No L2 credential functions on server" +fi +``` +``` +Expected: zero matches. The Go CVT module only handles type-0x00 wire tokens +(L1, agent_id). The L2-bearing type-0x01 is C-only in `clavis-cli/src/cvt.c`. + +**A4. `AgentData` has no L2 field.** +``` +grep -n "L2 *\[\]byte" clavis-vault/lib/types.go +``` +Expected: zero matches. (Field-level `L2 bool` flag on `VaultField` is fine — +that's metadata, not key material.) + +**A5. WL3 JSON contains no PII.** +``` +grep -n "customer_id\|name\|email\|slot_id" clavis-vault/lib/wl3.go +``` +Expected: zero matches in field definitions of `WL3Entry`. Adding any of these +makes the entire WL3 corpus GDPR-scoped. + +**A6. Keys are never logged or audited.** +``` +grep -rn "log\.\|fmt\.Print\|AuditEvent" clavis-vault/api/ clavis-vault/lib/ \ + | grep -iE "l1Key|master|wrappedL3|wrapped_l3|prf|p1Bytes" +``` +Expected: zero matches. Threat B violation if any key material reaches a log, +metric, error message, or audit row — even in hex, even in debug builds. + +**A7. Per-agent IP whitelist enforcement is on.** +``` +grep -n "AgentIPAllowed\|allowed_ips\|AllowedIPs" clavis-vault/api/middleware.go +``` +Expected: matches in `L1Middleware` showing first-contact fill + enforcement. +This is Threat C's primary defense. If the check is missing or commented out, +a stolen agent token works from anywhere on the internet. + +### Section B — Silent fallback + +**B1. No silent decryption fallback in browser code.** +``` +grep -rn "catch.*plaintext\|catch.*ciphertext\|catch.*=.*null" \ + clavis-vault/cmd/clavitor/web/ clavis-chrome/ clavis-firefox/ +``` +Expected: any match must be reviewed. The acceptable pattern is +`catch (e) { result = '[decryption failed]'; }` — visible to the user. + +**B2. No `return true` in security check stubs.** +``` +grep -rn "return true.*//.*TODO\|return true.*//.*for now\|return true.*//.*accept" \ + clavitor.ai/admin/ clavis-vault/ +``` +Expected: zero matches. If you see one, you've found a SECURITY.md violation. +Implement it or delete the call site. + +**B3. No commented-out security checks.** +``` +grep -rn "// *if !verify\|// *if !auth\|// *return.*Forbidden" \ + clavis-vault/ clavitor.ai/admin/ +``` +Expected: zero matches. + +**B4. No silent fallbacks in catch blocks (manual LLM review required).** + +Grep catches patterns, not intent. All catch blocks in decryption/security paths +must be manually reviewed to ensure they don't silently swallow errors. The LLM +must verify: no `catch (e) { return data; }`, no `catch { decrypted = ""; }`, +no `finally` blocks that restore old values without logging. + +### Section C — Test fixture security + +**C0. Test fixture key material is obviously fake.** + +Test data containing key material (L2, L3, master_key, PRF output): +- Must be instantly recognizable as test data (not production keys) +- **Pattern:** 32 identical bytes: `bytes.Repeat([]byte{0xAB}, 32)` or `{0x00}`, `{0xFF}`, `{0xDE}, {0xAD}` +- **Never** use random-looking values that could be mistaken for real keys +- **Never** derive test keys from production material +- **Never** commit binary test fixtures (always text/SQL that shows the pattern) + +Example: +```go +// GOOD - obviously fake +testL3 := bytes.Repeat([]byte{0xAB}, 32) +testMaster := bytes.Repeat([]byte{0x00}, 32) + +// BAD - looks like a real key +testL3 := []byte{0x4a, 0x9f, 0x2c, ...} // 32 random bytes +``` + +### Section D — Foundation drift + +**C1. No `_old/` directory.** +``` +find . -type d -name "_old" +``` +Expected: zero results. Dead code is deleted, not parked. + +**C2. No env vars for fundamental paths.** +``` +grep -rn "os.Getenv(\"DATA_DIR\")\|os.Getenv(\"WL3_DIR\")" clavis-vault/lib/ +``` +Expected: zero matches. These are hardcoded in `lib/config.go` to `vaults` +and `WL3` respectively. Symlink if you want different paths. + +**C3. No migration code.** +``` +grep -rn "ALTER TABLE\|migration\|Migration" clavis-vault/lib/dbcore.go \ + | grep -v "// " +``` +Expected: matches only inside `MigrateDB` for the existing `alternate_for` +and `verified_at` columns (legacy from earlier work). No NEW `ALTER TABLE` +should appear during early-stage development. + +**C4. Hardcoded build constants in HTML.** +``` +grep -rn "var BUILD = '" clavis-vault/cmd/clavitor/web/ +``` +Expected: zero matches. Build version must come from `/api/version` at +runtime, not from a hardcoded constant that goes stale. + +### Section E — DRY violations + +**E1. Two functions doing the same thing.** + +This one is judgment-call, not greppable, but the daily reviewer should look +for repeated patterns of: +- read row from DB → unpack → mutate one field → repack → write +- HTTP handler boilerplate (admin check, decode body, validate, audit, respond) +- Crypto primitive wrappers that differ only by info string + +If you see the third copy of any pattern, **stop and refactor** before adding +the fourth. + +The canonical example to model after: `lib/dbcore.go:agentMutate` — replaced +three near-identical functions with a higher-order helper. Net negative line +count, easier to add new mutations, single point of correctness. + +**E2. Two crypto implementations of the same primitive.** + +Browser code uses `clavis-vault/cmd/clavitor/web/crypto.js`. CLI uses +`clavis-cli/crypto/crypto.js`. These should be **byte-identical**. If they +diverge, the encrypted output diverges, and a credential created in one +client becomes unreadable in the other. + +``` +diff clavis-vault/cmd/clavitor/web/crypto.js clavis-cli/crypto/crypto.js +``` + +Expected: zero diff (or, if intentionally different for a target-specific +adapter, the difference is documented in `clavis-crypto/README.md`). + +### Section F — Threat model alignment + +**E1. Per-agent rate limits have safe defaults.** +``` +grep -n "RateLimit *: *3\|RateLimit *= *3\|RateLimitMinute.*3" \ + clavis-vault/api/handlers.go +grep -n "RateLimitHour *: *10\|RateLimitHour *= *10" \ + clavis-vault/api/handlers.go +``` +Expected: matches in `HandleCreateAgent`. If the defaults drift (e.g. someone +changes them to 0 = unlimited), the harvester defense is silently disabled +for new agents. + +**E2. `agentReadEntry` is called from every credential-read handler.** + +For each handler that returns decrypted credential data, the call is +mandatory. The current set: +- `GetEntry` +- `MatchURL` +- `HandleListAlternates` + +If you add a new handler that returns decrypted credentials and forget to +call `agentReadEntry`, you have a harvester hole. Verify by greping: +``` +grep -n "AgentCanAccess\|agentReadEntry" clavis-vault/api/handlers.go +``` +Every credential-read handler must call both `AgentCanAccess` (before accessing +data) and `agentReadEntry` (after decryption, before returning). Verify both +functions appear in the handler code path. + +**E3. `/api/entries/batch` is exempt from global rate limit.** +``` +grep -A2 "/api/entries/batch" clavis-vault/api/middleware.go +``` +Expected: a `next.ServeHTTP(w, r); return` block with the explanatory +comment intact. If the comment is missing, the next reviewer will reintroduce +the rate limit and break import. + +### Section F — Test posture + +**F1. All tests pass.** +``` +cd clavis-vault && go test ./lib/... ./api/... +cd clavitor.ai/admin && go test . +``` +Expected: all green. **Failing tests are foundation alerts**, not "tomorrow" +problems. + +**F2. JS syntax check passes.** +``` +cd clavis-vault && make build # validates JS as part of build +``` +Expected: no `JS syntax error` in output. JS files are embedded in the binary; +syntax errors break the running app silently in some browsers. + +**F3. New code has tests.** + +Not greppable. Judgment call. The rule: if you added a new function with +non-trivial logic, there should be a test for it in the same commit. The +tests model what the function is supposed to do. If you can't write a test +because "it's hard to test", the function is probably too coupled — refactor. + +### Section G — Dead code elimination + +**G1. No empty directories.** +```bash +find . -type d -empty | grep -v ".git" | grep -v "vendor" +``` +Expected: zero results. Empty directories are deleted. + +**G2. No orphaned HTML/JS/CSS files (with exceptions).** +```bash +# Find HTML files not referenced in Go code +# Exceptions: design-system/, marketing/, test-*.html (dev/test files) +for f in $(find . -name "*.html" | grep -v vendor | grep -v "design-system" | grep -v "marketing" | grep -v "test-"); do + name=$(basename "$f") + if ! grep -r "$name" --include="*.go" . 2>/dev/null | grep -q .; then + echo "Orphaned: $f" + fi +done +``` +Review any matches — may be standalone pages or may be dead. + +**Exceptions (intentionally not checked):** +- `design-system/` — standalone design documentation, not embedded +- `marketing/` — marketing assets, separate from application +- `test-*.html` — development/test files (manual review for cleanup) + +**G3. No unreachable Go functions (requires go-dead-code or similar tool).** +Manual check: review private functions (lowercase) with no callers in the same +package. Public functions with no importers are also suspicious — verify they +are API endpoints or intentionally exported. + +**G4. No commented-out blocks >5 lines.** +```bash +grep -rn "^\s*//\s*\w" --include="*.go" --include="*.js" . | \ + awk -F: '{print $1}' | sort | uniq -c | sort -rn | head -20 +``` +Review files with many commented lines — likely dead code. + +--- + + +### Git workflow + + +### Commit philosophy + +**Commit early, commit often.** A commit is a save point, not a publish event. +- One logical change per commit — don't mix bug fix + refactoring + new feature +- If you can't explain the commit in one sentence, it's too big — split it +- Work-in-progress commits are fine locally; clean up before sharing + +### Commit message format + +``` +: + +: (optional but encouraged) + +