clavitor/clavis/clavis-vault/SPEC-replication.md

202 lines
5.7 KiB
Markdown

# Replication Design — Active-Passive with Async Sync
## Overview
Primary POP (e.g., Calgary) replicates every write to Backup POP (e.g., Zurich).
Backup serves **read-only** traffic if primary fails.
## Key Principles
1. **Primary owns writes** — Backup never accepts mutations from clients
2. **Same wire format** — Replicate the exact request payload (not re-encoded)
3. **Async, non-blocking** — Primary doesn't wait for backup ACK (queue + retry)
4. **Dirty tracking per entry** — Each entry has `replicated_at` and dirty flag
5. **Read failover only** — Clients read from backup if primary down, but writes fail
## Architecture
### On Primary (Calgary)
```
Client Request → Primary Handler
[1] Apply to local DB
[2] Queue for replication (async)
[3] Return success to client (don't wait for backup)
Replication Worker (background)
POST to Backup /api/replication/apply
```
**Queue Structure:**
```go
type ReplicationTask struct {
EntryID int64
RawPayload []byte // Original request body (encrypted blob)
Method string // POST/PUT/DELETE
Timestamp int64 // When primary applied
RetryCount int
Dirty bool // true = not yet confirmed by backup
}
```
**Per-Entry Status (in entries table):**
```sql
replicated_at INTEGER, -- NULL = never replicated, timestamp = last confirmation
replication_dirty BOOLEAN -- true = pending replication, false = synced
```
### On Backup (Zurich)
```
POST /api/replication/apply
Validate: Is this from an authorized primary POP? (mTLS or shared secret)
Apply to local DB (exact same data, including encrypted blobs)
Return 200 ACK
```
**Backup rejects client writes:**
```go
if isClientRequest && isWriteOperation {
return 503, "Write operations not available on backup POP"
}
```
## Failure Scenarios
### 1. Backup Unavailable (Primary Still Up)
- Primary queues replication tasks (in-memory + SQLite for persistence)
- Retries with exponential backoff
- Marks entries as `dirty=true`
- Client operations continue normally
- When backup comes back: bulk sync dirty entries
### 2. Primary Fails (Backup Becomes Active)
- DNS/healthcheck detects primary down
- Clients routed to backup
- **Backup serves reads only**
- Writes return 503 with header: `X-Primary-Location: https://calgary.clavitor.ai`
- Manual intervention required to promote backup to primary
### 3. Split Brain (Both Think They're Primary)
- Prevented by design: Only one POP has "primary" role in control plane
- Backup refuses writes from clients
- If control plane fails: manual failover only
## Replication Endpoint (Backup)
```http
POST /api/replication/apply
Authorization: Bearer {inter-pop-token}
Content-Type: application/json
{
"source_pop": "calgary-01",
"entries": [
{
"entry_id": "abc123",
"operation": "create", // or "update", "delete"
"encrypted_data": "base64...",
"timestamp": 1743556800
}
]
}
```
Response:
```json
{
"acknowledged": ["abc123"],
"failed": [],
"already_exists": [] // For conflict detection
}
```
## Audit Log Handling
**Primary:** Logs all operations normally.
**Backup:** Logs its own operations (replication applies) but not client operations.
```go
// On backup, when applying replication:
lib.AuditLog(db, &lib.AuditEvent{
Action: lib.ActionReplicated, // Special action type
EntryID: entryID,
Title: "replicated from " + sourcePOP,
Actor: "system:replication",
})
```
## Client Failover Behavior
```go
// Client detects primary down (connection timeout, 503, etc.)
// Automatically tries backup POP
// On backup:
GET /api/entries/123 // ✅ Allowed
PUT /api/entries/123 // ❌ 503 + X-Primary-Location header
```
## Improvements Over Original Design
| Original Proposal | Improved |
|-------------------|----------|
| Batch polling every 30s | **Real-time async queue** — faster, lower lag |
| Store just timestamp | **Add dirty flag** — faster recovery, less scanning |
| Replica rejects all client traffic | **Read-only allowed** — true failover capability |
| Single replication target | **Primary + Backup concept** — clearer roles |
## Database Schema Addition
```sql
ALTER TABLE entries ADD COLUMN replicated_at INTEGER; -- NULL = never
ALTER TABLE entries ADD COLUMN replication_dirty BOOLEAN DEFAULT 0;
-- Index for fast "dirty" lookup
CREATE INDEX idx_entries_dirty ON entries(replication_dirty) WHERE replication_dirty = 1;
```
## Code Structure
**Commercial-only files:**
```
edition/
├── replication.go # Core replication logic (queue, worker)
├── replication_queue.go # SQLite-backed persistent queue
├── replication_client.go # HTTP client to backup POP
└── replication_handler.go # Backup's /api/replication/apply handler
```
**Modified:**
```
api/handlers.go # Check if backup mode, reject writes
api/middleware.go # Detect if backup POP, set context flag
```
## Security Considerations
1. **Inter-POP Auth:** mTLS or shared bearer token (rotated daily)
2. **Source Validation:** Backup verifies primary is authorized in control plane
3. **No Cascade:** Backup never replicates to another backup (prevent loops)
4. **Idempotency:** Replication operations are idempotent (safe to retry)
## Metrics to Track
- `replication_lag_seconds` — Time between primary apply and backup ACK
- `replication_queue_depth` — Number of pending entries
- `replication_failures_total` — Failed replication attempts
- `replication_fallback_reads` — Client reads served from backup
---
**Johan's Design + These Refinements = Production Ready**