# Security Review: Replication Function (Commercial Edition) **Scope:** Event-driven async replication between Primary and Backup POPs **Files Reviewed:** - `edition/config.go` - Config loading and validation - `edition/replication.go` - Replication worker and client - `edition/backup_mode.go` - Backup mode detection and write rejection - `lib/dbcore.go` - Dirty marking functions --- ## CRITICAL ISSUES (Fix Before Production) ### 1. MISSING: Inter-POP Authentication Implementation **Risk:** Unauthorized POPs could push malicious data to backups **Location:** `edition/replication.go:150` (stub) **Current State:** ```go // TODO: POST to backup POP // No authentication implemented ``` **Required:** - mTLS (client cert) OR - Shared secret in `Authorization: Bearer ` header - Token must be read from file (not hardcoded) **Fix:** ```go req.Header.Set("Authorization", "Bearer " + loadTokenFromFile(w.config.Auth.TokenFile)) // OR use TLS client certificates ``` ### 2. MISSING: Backup-Side Request Authentication **Risk:** Anyone with network access could POST to `/api/replication/apply` **Location:** Not yet implemented (backup receive endpoint) **Required on Backup:** - Verify `Authorization` header matches shared secret - Verify source POP ID is in authorized list - mTLS client cert validation (if using mTLS) ### 3. WEAK: Backup Mode Detection Uses Environment Variable **Risk:** `CLAVITOR_BACKUP_MODE=true` can be set accidentally or maliciously **Location:** `edition/backup_mode.go:23` **Current:** ```go if os.Getenv("CLAVITOR_BACKUP_MODE") == "true" { return true } ``` **Problem:** - Env vars are visible to all processes (`ps e`) - No validation that this POP is actually a backup - Could cause split-brain if primary thinks it's backup **Fix:** - Read from config file only (`replication.yaml` with `role: backup`) - Use control plane confirmation (call HQ to verify role) - Remove env var option entirely ### 4. MISSING: Replay Attack Protection **Risk:** Attacker could replay old replication requests **Location:** `edition/replication.go:173-177` **Current Request:** ```go type ReplicationRequest struct { SourcePOP string `json:"source_pop"` Entries []ReplicatedEntry `json:"entries"` Timestamp int64 `json:"timestamp"` } ``` **Missing:** - No nonce/unique ID per request - No request signature - No max age validation (5-minute window?) **Fix:** ```go type ReplicationRequest struct { SourcePOP string `json:"source_pop"` Entries []ReplicatedEntry `json:"entries"` Timestamp int64 `json:"timestamp"` Nonce string `json:"nonce"` // Random 16 bytes Signature string `json:"signature"` // HMAC-SHA256 of body } ``` ### 5. HIGH: Shared Secret Token Validation Weak **Risk:** Token file existence checked, but not token format/entropy **Location:** `edition/config.go:58` **Current:** ```go if _, err := os.Stat(tokenFile); err != nil { return nil, fmt.Errorf("auth token file not found: %s", tokenFile) } ``` **Missing:** - No token entropy validation (is it 32+ random bytes?) - No token rotation mechanism - No token expiration **Fix:** ```go token, err := os.ReadFile(tokenFile) if len(token) < 32 { return nil, fmt.Errorf("token must be at least 32 bytes") } // Check entropy (not all zeros, not repetitive) ``` --- ## HIGH SEVERITY ISSUES ### 6. MISSING: HTTPS Certificate Validation Disabled? **Risk:** MITM attack between primary and backup **Location:** `edition/replication.go:150` (stub) **Concern:** If using self-signed certs or `InsecureSkipVerify: true` **Required:** - Proper TLS cert validation - Pin backup POP's cert or use internal CA - Never `InsecureSkipVerify: true` ### 7. NO AUDIT LOGGING OF REPLICATION **Risk:** Cannot detect unauthorized replication attempts **Location:** Missing throughout **Required:** - Log every replication attempt (success and failure) - Include: source POP ID, entry count, timestamp, auth result - Alert on auth failures (possible attack) ### 8. CASCADE REPLICATION NOT PREVENTED **Risk:** Backup could accidentally re-replicate to another POP, causing loops **Location:** `edition/replication.go` **Current:** Worker only checks `Role != "primary"`, but doesn't prevent config errors **Fix:** ```go // In startReplication(): if cfg.Role == "backup" { log.Printf("Backup POP: replication worker disabled (receive-only)") return // Don't start worker } ``` ### 9. INFORMATION DISCLOSURE IN ERRORS **Risk:** Error messages leak internal paths and structure **Location:** `edition/config.go:23, 28, 59` **Current:** ```go return nil, fmt.Errorf("cannot read replication config: %w", err) return nil, fmt.Errorf("auth token file not found: %s", tokenFile) ``` **Problem:** - Exposes `/etc/clavitor/replication.yaml` path - Exposes `/etc/clavitor/replication.key` path - Could help attacker locate secrets **Fix:** ```go return nil, fmt.Errorf("replication configuration error") // Log full details internally only ``` --- ## MEDIUM SEVERITY ISSUES ### 10. NO RATE LIMITING ON REPLICATION ENDPOINT **Risk:** DoS via flood of replication requests **Location:** Not implemented (backup receive endpoint) **Required:** - Rate limit `/api/replication/apply` per source POP - Max 1 request/second per POP - Ban source POP after 10 failed auth attempts ### 11. BACKOFF CALCULATION FLOAT CONVERSION **Risk:** Minor - could panic or behave unexpectedly **Location:** `edition/replication.go:109` **Current:** ```go backoff := time.Duration(math.Pow(5, float64(attempt-1))) * time.Second ``` **Issue:** For attempt=5, result is 625 seconds = 10.4 minutes, but truncated **Fix:** Use integer math: ```go delays := []time.Duration{1*time.Second, 5*time.Second, 25*time.Second, 125*time.Second, 625*time.Second} if attempt > 0 && attempt <= len(delays) { backoff = delays[attempt-1] } ``` ### 12. WORKER DB CONNECTION SEPARATE FROM MAIN **Risk:** Inconsistent state if main DB has different connection params **Location:** `edition/replication.go:56-61` **Current:** ```go dbPath := dataDir + "/clavitor-*.db" // TODO: proper vault lookup db, err := lib.OpenDB(dbPath) ``` **Issue:** Replication worker uses separate DB connection from main handlers. If WAL mode settings differ, could cause lock issues. **Fix:** Pass existing DB connection from main, or ensure identical settings. ### 13. NO MAX DIRTY ENTRY LIMIT **Risk:** Memory exhaustion if backup down for long time **Location:** `edition/replication.go:142` **Current:** ```go entries, err := lib.EntryListDirty(w.db, w.config.Replication.BatchSize) ``` **Issue:** Only limited by batch size, but if replication fails repeatedly, entries accumulate in DB (not memory). **Actually OK** - dirty entries stay in DB, not memory. **Status:** Acceptable (disk-bound, not memory-bound) --- ## LOW SEVERITY / DEFENSE IN DEPTH ### 14. TIMING ATTACK ON TOKEN COMPARISON **Risk:** Side-channel attack if comparing tokens naively **Location:** Not implemented yet (backup auth) **Fix:** Use constant-time comparison: ```go if subtle.ConstantTimeCompare(receivedToken, expectedToken) != 1 { return authFailed } ``` ### 15. NO REQUEST SIZE LIMITS ON BACKUP ENDPOINT **Risk:** Large request could cause OOM **Location:** Not implemented (backup receive) **Fix:** `http.MaxBytesReader(w, r.Body, maxReplicationBodySize)` ### 16. LOG INJECTION VIA ENTRY DATA **Risk:** If entry titles contain format strings or escape sequences **Location:** `edition/replication.go:154` **Current:** ```go log.Printf("Replicating %d entries to %s", len(entries), w.config.BackupPOP.URL) ``` **Issue:** If somehow user-controlled data gets into log, could inject log formatters **Fix:** Use structured logging, never printf with user data --- ## SECURITY RECOMMENDATIONS ### Immediate (Before Production) 1. **Implement mTLS between POPs** - Most secure option, no shared secrets 2. **Add request signing with HMAC** - Prevent replays 3. **Add replication audit logging** - Security monitoring 4. **Remove env var backup mode** - Config file only 5. **Add rate limiting** - Prevent DoS ### Short Term 6. **Token rotation mechanism** - Hot-swap without restart 7. **Control plane integration** - Central authority for POP roles 8. **Add security headers** - `X-Content-Type-Options`, etc. ### Design Improvements 9. **Consider using ed25519 signatures** instead of shared secrets 10. **Add replication watermark/epoch** - For detecting gaps --- ## SECURE CONFIGURATION EXAMPLE ```yaml # /etc/clavitor/replication.yaml pop_id: "calgary-01" region: "north-america" role: "primary" backup_pop: id: "zurich-01" url: "https://zurich-01.clavitor.ai:8443" # TLS 1.3 only # No auth_token_file here - use mTLS instead auth: method: "mtls" # or "bearer" # For mTLS: mtls_cert: "/etc/clavitor/replication.crt" mtls_key: "/etc/clavitor/replication.key" mtls_ca: "/etc/clavitor/ca.crt" # Verify backup cert # For bearer token: # token_file: "/etc/clavitor/replication.key" # token_rotation_interval: 86400 # 24 hours replication: batch_size: 100 max_retries: 5 request_timeout: 30 # seconds max_request_age: 300 # 5 minutes (anti-replay) security: audit_log: "/var/log/clavitor/replication.log" rate_limit_per_minute: 60 ban_after_failures: 10 ban_duration: 3600 # 1 hour ``` --- ## SUMMARY | Risk Level | Count | Issues | |------------|-------|--------| | **CRITICAL** | 5 | Auth not implemented, replay attacks, env var backup mode, weak token validation, MITM risk | | **HIGH** | 4 | HTTPS validation, audit logging, cascade prevention, info disclosure | | **MEDIUM** | 4 | Rate limiting, backoff math, DB connection, size limits | | **LOW** | 3 | Timing attacks, request size, log injection | **Bottom Line:** The replication scaffolding is good, but the security-critical parts (auth, replay protection, audit) are TODO stubs. **Do not deploy to production** until CRITICAL and HIGH issues are resolved.