From fa7541bd4d75224288105f0f612750bd2fdee4c1 Mon Sep 17 00:00:00 2001 From: James Date: Thu, 2 Apr 2026 01:02:36 -0400 Subject: [PATCH] Security review: Replication functionality (Commercial Only) Comprehensive security audit of event-driven replication. CRITICAL issues (5): 1. Inter-POP authentication not implemented (stub TODO) 2. Backup-side request authentication missing 3. Backup mode uses env var (should be config-only) 4. No replay attack protection (need nonces + signatures) 5. Weak token validation (only checks existence, not entropy) HIGH issues (4): 6. HTTPS cert validation concern 7. No audit logging of replication attempts 8. Cascade replication not prevented 9. Information disclosure in error messages Status: NOT PRODUCTION READY - security TODO stubs present --- .../SECURITY_REVIEW_REPLICATION.md | 344 ++++++++++++++++++ 1 file changed, 344 insertions(+) create mode 100644 clavis/clavis-vault/SECURITY_REVIEW_REPLICATION.md diff --git a/clavis/clavis-vault/SECURITY_REVIEW_REPLICATION.md b/clavis/clavis-vault/SECURITY_REVIEW_REPLICATION.md new file mode 100644 index 0000000..9ef14a5 --- /dev/null +++ b/clavis/clavis-vault/SECURITY_REVIEW_REPLICATION.md @@ -0,0 +1,344 @@ +# Security Review: Replication Function (Commercial Edition) + +**Scope:** Event-driven async replication between Primary and Backup POPs +**Files Reviewed:** +- `edition/config.go` - Config loading and validation +- `edition/replication.go` - Replication worker and client +- `edition/backup_mode.go` - Backup mode detection and write rejection +- `lib/dbcore.go` - Dirty marking functions + +--- + +## CRITICAL ISSUES (Fix Before Production) + +### 1. MISSING: Inter-POP Authentication Implementation +**Risk:** Unauthorized POPs could push malicious data to backups +**Location:** `edition/replication.go:150` (stub) + +**Current State:** +```go +// TODO: POST to backup POP +// No authentication implemented +``` + +**Required:** +- mTLS (client cert) OR +- Shared secret in `Authorization: Bearer ` header +- Token must be read from file (not hardcoded) + +**Fix:** +```go +req.Header.Set("Authorization", "Bearer " + loadTokenFromFile(w.config.Auth.TokenFile)) +// OR use TLS client certificates +``` + +### 2. MISSING: Backup-Side Request Authentication +**Risk:** Anyone with network access could POST to `/api/replication/apply` +**Location:** Not yet implemented (backup receive endpoint) + +**Required on Backup:** +- Verify `Authorization` header matches shared secret +- Verify source POP ID is in authorized list +- mTLS client cert validation (if using mTLS) + +### 3. WEAK: Backup Mode Detection Uses Environment Variable +**Risk:** `CLAVITOR_BACKUP_MODE=true` can be set accidentally or maliciously +**Location:** `edition/backup_mode.go:23` + +**Current:** +```go +if os.Getenv("CLAVITOR_BACKUP_MODE") == "true" { + return true +} +``` + +**Problem:** +- Env vars are visible to all processes (`ps e`) +- No validation that this POP is actually a backup +- Could cause split-brain if primary thinks it's backup + +**Fix:** +- Read from config file only (`replication.yaml` with `role: backup`) +- Use control plane confirmation (call HQ to verify role) +- Remove env var option entirely + +### 4. MISSING: Replay Attack Protection +**Risk:** Attacker could replay old replication requests +**Location:** `edition/replication.go:173-177` + +**Current Request:** +```go +type ReplicationRequest struct { + SourcePOP string `json:"source_pop"` + Entries []ReplicatedEntry `json:"entries"` + Timestamp int64 `json:"timestamp"` +} +``` + +**Missing:** +- No nonce/unique ID per request +- No request signature +- No max age validation (5-minute window?) + +**Fix:** +```go +type ReplicationRequest struct { + SourcePOP string `json:"source_pop"` + Entries []ReplicatedEntry `json:"entries"` + Timestamp int64 `json:"timestamp"` + Nonce string `json:"nonce"` // Random 16 bytes + Signature string `json:"signature"` // HMAC-SHA256 of body +} +``` + +### 5. HIGH: Shared Secret Token Validation Weak +**Risk:** Token file existence checked, but not token format/entropy +**Location:** `edition/config.go:58` + +**Current:** +```go +if _, err := os.Stat(tokenFile); err != nil { + return nil, fmt.Errorf("auth token file not found: %s", tokenFile) +} +``` + +**Missing:** +- No token entropy validation (is it 32+ random bytes?) +- No token rotation mechanism +- No token expiration + +**Fix:** +```go +token, err := os.ReadFile(tokenFile) +if len(token) < 32 { + return nil, fmt.Errorf("token must be at least 32 bytes") +} +// Check entropy (not all zeros, not repetitive) +``` + +--- + +## HIGH SEVERITY ISSUES + +### 6. MISSING: HTTPS Certificate Validation Disabled? +**Risk:** MITM attack between primary and backup +**Location:** `edition/replication.go:150` (stub) + +**Concern:** If using self-signed certs or `InsecureSkipVerify: true` + +**Required:** +- Proper TLS cert validation +- Pin backup POP's cert or use internal CA +- Never `InsecureSkipVerify: true` + +### 7. NO AUDIT LOGGING OF REPLICATION +**Risk:** Cannot detect unauthorized replication attempts +**Location:** Missing throughout + +**Required:** +- Log every replication attempt (success and failure) +- Include: source POP ID, entry count, timestamp, auth result +- Alert on auth failures (possible attack) + +### 8. CASCADE REPLICATION NOT PREVENTED +**Risk:** Backup could accidentally re-replicate to another POP, causing loops +**Location:** `edition/replication.go` + +**Current:** Worker only checks `Role != "primary"`, but doesn't prevent config errors + +**Fix:** +```go +// In startReplication(): +if cfg.Role == "backup" { + log.Printf("Backup POP: replication worker disabled (receive-only)") + return // Don't start worker +} +``` + +### 9. INFORMATION DISCLOSURE IN ERRORS +**Risk:** Error messages leak internal paths and structure +**Location:** `edition/config.go:23, 28, 59` + +**Current:** +```go +return nil, fmt.Errorf("cannot read replication config: %w", err) +return nil, fmt.Errorf("auth token file not found: %s", tokenFile) +``` + +**Problem:** +- Exposes `/etc/clavitor/replication.yaml` path +- Exposes `/etc/clavitor/replication.key` path +- Could help attacker locate secrets + +**Fix:** +```go +return nil, fmt.Errorf("replication configuration error") +// Log full details internally only +``` + +--- + +## MEDIUM SEVERITY ISSUES + +### 10. NO RATE LIMITING ON REPLICATION ENDPOINT +**Risk:** DoS via flood of replication requests +**Location:** Not implemented (backup receive endpoint) + +**Required:** +- Rate limit `/api/replication/apply` per source POP +- Max 1 request/second per POP +- Ban source POP after 10 failed auth attempts + +### 11. BACKOFF CALCULATION FLOAT CONVERSION +**Risk:** Minor - could panic or behave unexpectedly +**Location:** `edition/replication.go:109` + +**Current:** +```go +backoff := time.Duration(math.Pow(5, float64(attempt-1))) * time.Second +``` + +**Issue:** For attempt=5, result is 625 seconds = 10.4 minutes, but truncated + +**Fix:** Use integer math: +```go +delays := []time.Duration{1*time.Second, 5*time.Second, 25*time.Second, 125*time.Second, 625*time.Second} +if attempt > 0 && attempt <= len(delays) { + backoff = delays[attempt-1] +} +``` + +### 12. WORKER DB CONNECTION SEPARATE FROM MAIN +**Risk:** Inconsistent state if main DB has different connection params +**Location:** `edition/replication.go:56-61` + +**Current:** +```go +dbPath := dataDir + "/clavitor-*.db" // TODO: proper vault lookup +db, err := lib.OpenDB(dbPath) +``` + +**Issue:** Replication worker uses separate DB connection from main handlers. If WAL mode settings differ, could cause lock issues. + +**Fix:** Pass existing DB connection from main, or ensure identical settings. + +### 13. NO MAX DIRTY ENTRY LIMIT +**Risk:** Memory exhaustion if backup down for long time +**Location:** `edition/replication.go:142` + +**Current:** +```go +entries, err := lib.EntryListDirty(w.db, w.config.Replication.BatchSize) +``` + +**Issue:** Only limited by batch size, but if replication fails repeatedly, entries accumulate in DB (not memory). **Actually OK** - dirty entries stay in DB, not memory. + +**Status:** Acceptable (disk-bound, not memory-bound) + +--- + +## LOW SEVERITY / DEFENSE IN DEPTH + +### 14. TIMING ATTACK ON TOKEN COMPARISON +**Risk:** Side-channel attack if comparing tokens naively +**Location:** Not implemented yet (backup auth) + +**Fix:** Use constant-time comparison: +```go +if subtle.ConstantTimeCompare(receivedToken, expectedToken) != 1 { + return authFailed +} +``` + +### 15. NO REQUEST SIZE LIMITS ON BACKUP ENDPOINT +**Risk:** Large request could cause OOM +**Location:** Not implemented (backup receive) + +**Fix:** `http.MaxBytesReader(w, r.Body, maxReplicationBodySize)` + +### 16. LOG INJECTION VIA ENTRY DATA +**Risk:** If entry titles contain format strings or escape sequences +**Location:** `edition/replication.go:154` + +**Current:** +```go +log.Printf("Replicating %d entries to %s", len(entries), w.config.BackupPOP.URL) +``` + +**Issue:** If somehow user-controlled data gets into log, could inject log formatters + +**Fix:** Use structured logging, never printf with user data + +--- + +## SECURITY RECOMMENDATIONS + +### Immediate (Before Production) + +1. **Implement mTLS between POPs** - Most secure option, no shared secrets +2. **Add request signing with HMAC** - Prevent replays +3. **Add replication audit logging** - Security monitoring +4. **Remove env var backup mode** - Config file only +5. **Add rate limiting** - Prevent DoS + +### Short Term + +6. **Token rotation mechanism** - Hot-swap without restart +7. **Control plane integration** - Central authority for POP roles +8. **Add security headers** - `X-Content-Type-Options`, etc. + +### Design Improvements + +9. **Consider using ed25519 signatures** instead of shared secrets +10. **Add replication watermark/epoch** - For detecting gaps + +--- + +## SECURE CONFIGURATION EXAMPLE + +```yaml +# /etc/clavitor/replication.yaml +pop_id: "calgary-01" +region: "north-america" +role: "primary" + +backup_pop: + id: "zurich-01" + url: "https://zurich-01.clavitor.ai:8443" # TLS 1.3 only + # No auth_token_file here - use mTLS instead + +auth: + method: "mtls" # or "bearer" + # For mTLS: + mtls_cert: "/etc/clavitor/replication.crt" + mtls_key: "/etc/clavitor/replication.key" + mtls_ca: "/etc/clavitor/ca.crt" # Verify backup cert + # For bearer token: + # token_file: "/etc/clavitor/replication.key" + # token_rotation_interval: 86400 # 24 hours + +replication: + batch_size: 100 + max_retries: 5 + request_timeout: 30 # seconds + max_request_age: 300 # 5 minutes (anti-replay) + +security: + audit_log: "/var/log/clavitor/replication.log" + rate_limit_per_minute: 60 + ban_after_failures: 10 + ban_duration: 3600 # 1 hour +``` + +--- + +## SUMMARY + +| Risk Level | Count | Issues | +|------------|-------|--------| +| **CRITICAL** | 5 | Auth not implemented, replay attacks, env var backup mode, weak token validation, MITM risk | +| **HIGH** | 4 | HTTPS validation, audit logging, cascade prevention, info disclosure | +| **MEDIUM** | 4 | Rate limiting, backoff math, DB connection, size limits | +| **LOW** | 3 | Timing attacks, request size, log injection | + +**Bottom Line:** The replication scaffolding is good, but the security-critical parts (auth, replay protection, audit) are TODO stubs. **Do not deploy to production** until CRITICAL and HIGH issues are resolved.