clavitor/clavis/clavis-vault/CLAUDE.md

209 lines
14 KiB
Markdown

# Clavis Vault — CLAUDE.md
> **Quickstart (60s):** [../../QUICKSTART.md](../../QUICKSTART.md) — who you are, 4 things to do, critical rules.
> **Deep reference:** [../../CLAVITOR-AGENT-HANDBOOK.md](../../CLAVITOR-AGENT-HANDBOOK.md) — Section V: clavis-vault (your domain).
> **You are:** **Sarah** — Run `./scripts/daily-review.sh` every morning. Fix failures first.
## Foundation First — No Mediocrity. Ever.
The rule is simple: do it right, or say something.
Johan is an architect. Architects do not patch cracks in a bad foundation — they rebuild. Every agent on this team operates the same way.
### What this means in practice
If you need three fixes for one problem, stop. Something fundamental is wrong. Name it, surface it — we fix that, not the symptom.
If the code is spaghetti, say so. Do not add another workaround. The workaround is the problem now.
Quick fixes are not fixes. A "temporary" hack that ships is permanent. If it is not the right solution, it is the wrong solution.
Foundation > speed. A solid base makes everything downstream easy. A shaky base makes everything downstream a nightmare. We build bases.
### The restart rule
When the foundation is wrong: start over. Not "refactor slightly." Not "add an abstraction layer on top." Start over. This applies to code, infrastructure, design, encryption schemes, and written work alike.
### Q&D is research, not output
Exploratory/throwaway work has its place — but it stays in research. Nothing Q&D ships. Nothing Q&D becomes the production path. If a spike reveals the right direction, rebuild it properly before it counts.
### When you hit a bad foundation
Call it out. Do not work around it. Bad foundations are not your fault — but silently building on them is. Surface the problem, we work on it together.
The bar is high. The support is real.
---
## Security Failures — NEVER HIDE THEM
**The cardinal rule:** If decryption/verification fails, expose the failure. Never fall back to plaintext. Never silently continue.
### WRONG — Silent fallback (fireable offense)
```javascript
try {
decrypted = await decrypt(ciphertext);
} catch (e) {
decrypted = plaintext; // NEVER DO THIS
}
```
### CORRECT — Visible failure
```javascript
try {
decrypted = await decrypt(ciphertext);
} catch (e) {
decrypted = '[decryption failed]'; // User sees the failure
}
```
This applies to:
- Encryption/decryption errors
- Signature verification failures
- Authentication failures
- Tampering detection
- Any security-critical operation
**Security failures must be noisy, visible, and blocking — never silent, hidden, or permissive.**
See `SECURITY.md` for full principles.
---
## Edition System (Community vs Commercial)
Clavitor Vault has two editions with build-time separation:
### Community Edition (Default)
```bash
go build -o clavitor ./cmd/clavitor/
```
- No telemetry by default (privacy-first)
- Local logging only
- Self-hosted
- Elastic License 2.0
### Commercial Edition
```bash
go build -tags commercial -o clavitor ./cmd/clavitor/
```
- Centralized telemetry to clavitor.ai
- Operator alerts POST to `/v1/alerts`
- Multi-POP management
- Commercial license
### Using the Edition Package
```go
import "github.com/johanj/clavitor/edition"
// Send operator alerts (works in both editions)
edition.Current.AlertOperator(ctx, "auth_error", "message", details)
// Check edition
currentEdition := edition.Current.Name() // "community" or "commercial"
```
See `edition/CLAUDE.md` for full documentation.
---
## Clavitor Vault v2 — Current State & Testing
### What we built this session
#### 1. Domain classification for import scopes
- Import page (`cmd/clavitor/web/import.html`) parses 14+ password manager formats client-side
- Unique domains are extracted (eTLD+1) and sent to `https://clavitor.ai/classify`
- The classify endpoint uses Claude Haiku on OpenRouter to categorize domains into 13 scopes: finance, social, shopping, work, dev, email, media, health, travel, home, education, government
- Results are stored permanently in SQLite on clavitor.ai (`domain_scopes` table) — NOT a cache, a lookup table that benefits all users
- Domains with no URL get scope "unclassified" (not "misc"). "misc" = LLM tried and failed
- Domains are sent in chunks of 200 to stay within token limits
- Classification is opt-in: user sees consent dialog with Yes/Skip/Cancel
#### 2. Import flow UX
- Drop file → parse → hide file step → consent dialog (Yes/Skip/Cancel)
- Cancel returns to file step
- After classification: entry list with scope pills as clickable filters, scope group headers with checkboxes
- Import + Cancel buttons appear only after classification
- Wider layout (960px), one-line items: title + username, no URL clutter
- Black entry icons (LGN/CARD/NOTE) with white text — on brand
- Global black checkboxes (`accent-color: var(--text)`)
- Unified CSS classes: `.item-row`, `.item-icon`, `.item-list` (replacing import-specific classes)
#### 3. Import filtering visibility (NEW)
- Import parsers now track what gets filtered and why
- `importers-parsers.js` has `_importReport` object that records: `rawCount`, `filtered[]`, `finalCount`
- For Proton Pass: tracks trashed items, aliases, duplicates, empty entries
- UI shows: `"1073 entries ready (1320 parsed, 247 filtered)"`
- Filter badge with tooltip showing breakdown: `"Filtered: 150 duplicate, 97 alias, 12 trashed"`
- Clicking the filtered badge shows detailed list of what was skipped
- **Why**: User saw 1073 during import but only 818 in vault. Now they can see the ~250 records were: duplicates (older versions), email aliases, and trashed items.
#### 4. Credential Alternates System (NEW)
- **Problem**: Chrome/Firefox imports don't have modification dates, so we can't tell which password is newer. Same site+user from different sources would overwrite each other.
- **Solution**: Multiple passwords per site+user are now stored as "alternates" instead of overwriting.
- **Database**: New columns `alternate_for` (points to primary entry_id) and `verified_at` (timestamp when password worked).
- **Batch Import**: Changed from upsert-by-title to match-by-title+username. If same credentials exist, new one becomes an alternate.
- **APIs**:
- `POST /api/entries/batch` returns `{created, alternates}` instead of `{created, updated}`
- `POST /ext/credentials/{id}/worked` - CLI/extension calls this when a password works
- `GET /ext/credentials/{id}/alternates` - Get all alternates for a credential
- `GET /ext/match?url=...` now returns alternates array sorted by verified_at (verified first)
- **CLI Protocol**: When multiple passwords for same site+user, try verified ones first. On success, call `/worked` endpoint. Backend marks winner as verified and links alternates to it.
- **Import UI**: Shows `"Done — 50 imported, 12 alternates"` so user knows some were stored as alternates.
#### 5. Security hardening (IN PROGRESS — needs testing)
- **List endpoint stripped**: GET /api/entries now always returns metadata only (title, type, scopes, entry_id). No data blobs, no ?meta=1 toggle. Full entry data only via GET /api/entries/{id} with scope enforcement.
- **Agent system type guard**: Agents cannot create/update entries with type=agent or type=scope. Enforced on CreateEntry, CreateEntryBatch, UpsertEntry, UpdateEntry.
- **L3 field protection**: Agents cannot overwrite L3 fields. If existing field is tier 3, the agent's update preserves the original value silently.
- **Per-agent IP whitelist**: Stored in agent entry (L1-encrypted). Empty on creation → filled with IP from first contact → enforced on every subsequent request. Supports CIDRs (10.0.0.0/16), exact IPs, and FQDNs (home.smith.family), comma-separated.
- **Per-agent rate limiting**: Configurable requests/minute per agent ID (not per IP). Stored in agent entry.
- **Admin operations require PRF tap**: Agent CRUD and scope updates require a fresh WebAuthn assertion. Flow: POST /auth/admin/begin → PRF tap → POST /auth/admin/complete → one-time admin token in X-Admin-Token header → pass to admin endpoint. Token is single-use, 5-minute expiry.
### What is semi-done / needs testing
The security hardening code compiles and the vault runs, but none of it has been tested with actual agent tokens or WebAuthn assertions yet. Specifically:
1. **IP whitelist first-contact fill**: ✅ Fixed - DB errors now return 500
2. **IP whitelist enforcement**: Does CIDR matching work? FQDN resolution? Comma-separated lists? FQDN now has 5-min cache
3. **Per-agent rate limiter**: Does it correctly track per agent ID and reset per minute?
4. **Admin auth flow**: Does the challenge-response work end-to-end? Does the admin token get consumed correctly (single-use)?
5. **System type guards**: ✅ Fixed - Agents blocked entirely from batch import; returns 403 on forbidden types
6. **L3 field preservation**: ✅ Fixed - Agents cannot overwrite L3 fields in batch or upsert
7. **List endpoint**: Verify no data blobs leak. Check browser console: entries[0] should have no data or fields property.
### Known Issues (Accepted)
**IP Whitelist Race Condition**: There is a theoretical race on first-contact IP recording if two parallel requests from different IPs arrive simultaneously. This was reviewed and accepted because:
- Requires a stolen agent token (already a compromise)
- Requires racing first contact from two different IPs
- The "loser" simply won't be auto-whitelisted
- Cannot be reproduced in testing; practically impossible to trigger
- Fix would require plaintext column + atomic update (not worth complexity)
See comment in `api/middleware.go` for full rationale.
**Admin Token Consumed Early**: The admin token is consumed immediately upon validation in `requireAdmin()`. If the subsequent operation fails (DB error, validation error, etc.), the token is gone but the operation didn't complete. The user must perform a fresh PRF tap to retry.
This was reviewed and accepted because:
- 5-10 minute token lifetime makes re-auth acceptable
- It's a UX inconvenience, not a security vulnerability
- Deferring consumption until operation success would require transaction-like complexity
- Rare edge case: requires admin operation to fail after token validation
### How testing works
No automated test suite for this session's work. Testing is manual via the browser:
1. Vault runs locally on forge (this machine) at port 1984, accessed via https://dev.clavitor.ai/app/
2. Caddy on 192.168.0.2 reverse-proxies dev.clavitor.ai → forge:1984
3. Import testing: Drop a Proton Pass ZIP export (or any of the 14 supported formats) on the import page. Check scope pills, counts, classifications, and **filtered count badge** — shows why records were skipped (duplicates, aliases, trashed).
4. Classification testing: Watch server logs on clavitor.ai: `ssh root@<tailscale-ip> "journalctl -u clavitor-web --no-pager -n 30"`. Check domain_scopes table: `sqlite3 /opt/clavitor-web/clavitor.db 'SELECT COUNT(*) FROM domain_scopes'`
5. Screen capture: `/capture` skill takes a live screenshot from Johan's Mac (display 3). `/screenshot` fetches the latest manual screenshot.
6. Version verification: The topbar shows the **build timestamp** (e.g., `2026-04-04-1432`) fetched from `/api/version`. If the timestamp doesn't update after `make dev`, the old binary is still running — check `make status`.
7. DB location: Vault data is in `/home/johan/dev/clavitor/clavis/clavis-vault/data/`. Delete clavitor-* files there to start fresh (will require passkey re-registration).
### Key files
| File | What |
|------|------|
| api/handlers.go | All HTTP handlers, security guards, admin auth |
| api/middleware.go | L1 auth, CVT token parsing, IP whitelist, agent rate limit |
| lib/types.go | AgentData, VaultData, AgentCanAccess, AgentIPAllowed |
| lib/dbcore.go | DB ops, AgentLookup, AgentUpdateAllowedIPs |
| cmd/clavitor/web/import.html | Import page structure |
| cmd/clavitor/web/import.js | Import UI controller, API key detection, encryption |
| cmd/clavitor/web/importers.js | ZIP extraction, domain classification, format detection |
| cmd/clavitor/web/importers-parsers.js | 14 password manager parsers, import reporting |
| cmd/clavitor/web/topbar.js | Version number, nav, idle timer |
| cmd/clavitor/web/clavitor-app.css | All styles, item-row/item-icon system |
| clavitor.ai/main.go | Portal + /classify endpoint (Haiku on OpenRouter) |
### Deploy Clavitor Vault (dev)
Working directory: `/home/johan/dev/clavitor/clavis/clavis-vault`
**Prerequisites:**
```bash
# Enable user systemd services (one-time setup)
systemctl --user enable --now clavitor.service
```
**Build and deploy (one command):**
```bash
make dev # stop → build → start (graceful shutdown via SIGTERM)
```
**Individual commands:**
```bash
make stop # systemctl --user stop clavitor.service
make start # systemctl --user start clavitor.service
make restart # systemctl --user restart clavitor.service (no rebuild)
make status # systemctl --user status clavitor.service
make logs # journalctl --user -u clavitor -f
make build # go build...
```
**Service file location:** `~/.config/systemd/user/clavitor.service`
Caddy on 192.168.0.2 reverse-proxies dev.clavitor.ai → forge:1984 (self-signed, so tls_insecure_skip_verify).
**Update Caddy config:**
```bash
ssh root@192.168.0.2
# Edit /etc/caddy/Caddyfile, then:
systemctl reload caddy
```
Web files are embedded at compile time (go:embed). CSS/JS/HTML changes require rebuild.
Bump version in `cmd/clavitor/web/topbar.js` (search for v2.0.) to verify new build is live.
### Deploy clavitor.ai (prod)
Working directory: `/home/johan/dev/clavitor/clavitor.ai`
```bash
make deploy-prod
```
This cross-compiles, SCPs to Zürich, enters maintenance mode, restarts systemd, exits maintenance. One command.
SSH: root@clavitor.ai — port 22 blocked on public IP, use Tailscale. Never use johan@. Avoid rapid SSH attempts (fail2ban will lock you out — it already happened once this session).
Env vars are in `/opt/clavitor-web/.env` and `/etc/systemd/system/clavitor-web.service`. After changing .env, run `systemctl daemon-reload && systemctl restart clavitor-web` on the server.
**NEVER deploy the database. Only the binary gets uploaded. The SQLite DB on prod is the source of truth.**
Verify: `ssh root@<tailscale-ip> "systemctl status clavitor-web"`
### IMPORTANT
**NEVER deploy to prod without Johan's explicit approval. This caused a SEV-1 on 2026-03-29.**