clavitor/clavis/clavis-vault/SPEC-replication.md

5.7 KiB

Replication Design — Active-Passive with Async Sync

Overview

Primary POP (e.g., Calgary) replicates every write to Backup POP (e.g., Zurich). Backup serves read-only traffic if primary fails.

Key Principles

  1. Primary owns writes — Backup never accepts mutations from clients
  2. Same wire format — Replicate the exact request payload (not re-encoded)
  3. Async, non-blocking — Primary doesn't wait for backup ACK (queue + retry)
  4. Dirty tracking per entry — Each entry has replicated_at and dirty flag
  5. Read failover only — Clients read from backup if primary down, but writes fail

Architecture

On Primary (Calgary)

Client Request → Primary Handler
                      ↓
              [1] Apply to local DB
              [2] Queue for replication (async)
              [3] Return success to client (don't wait for backup)
                      ↓
              Replication Worker (background)
                      ↓
              POST to Backup /api/replication/apply

Queue Structure:

type ReplicationTask struct {
    EntryID      int64
    RawPayload   []byte      // Original request body (encrypted blob)
    Method       string      // POST/PUT/DELETE
    Timestamp    int64       // When primary applied
    RetryCount   int
    Dirty        bool        // true = not yet confirmed by backup
}

Per-Entry Status (in entries table):

replicated_at INTEGER,     -- NULL = never replicated, timestamp = last confirmation
replication_dirty BOOLEAN  -- true = pending replication, false = synced

On Backup (Zurich)

POST /api/replication/apply
    ↓
Validate: Is this from an authorized primary POP? (mTLS or shared secret)
    ↓
Apply to local DB (exact same data, including encrypted blobs)
    ↓
Return 200 ACK

Backup rejects client writes:

if isClientRequest && isWriteOperation {
    return 503, "Write operations not available on backup POP"
}

Failure Scenarios

1. Backup Unavailable (Primary Still Up)

  • Primary queues replication tasks (in-memory + SQLite for persistence)
  • Retries with exponential backoff
  • Marks entries as dirty=true
  • Client operations continue normally
  • When backup comes back: bulk sync dirty entries

2. Primary Fails (Backup Becomes Active)

  • DNS/healthcheck detects primary down
  • Clients routed to backup
  • Backup serves reads only
  • Writes return 503 with header: X-Primary-Location: https://calgary.clavitor.ai
  • Manual intervention required to promote backup to primary

3. Split Brain (Both Think They're Primary)

  • Prevented by design: Only one POP has "primary" role in control plane
  • Backup refuses writes from clients
  • If control plane fails: manual failover only

Replication Endpoint (Backup)

POST /api/replication/apply
Authorization: Bearer {inter-pop-token}
Content-Type: application/json

{
  "source_pop": "calgary-01",
  "entries": [
    {
      "entry_id": "abc123",
      "operation": "create",  // or "update", "delete"
      "encrypted_data": "base64...",
      "timestamp": 1743556800
    }
  ]
}

Response:

{
  "acknowledged": ["abc123"],
  "failed": [],
  "already_exists": []  // For conflict detection
}

Audit Log Handling

Primary: Logs all operations normally.

Backup: Logs its own operations (replication applies) but not client operations.

// On backup, when applying replication:
lib.AuditLog(db, &lib.AuditEvent{
    Action: lib.ActionReplicated,  // Special action type
    EntryID: entryID,
    Title: "replicated from " + sourcePOP,
    Actor: "system:replication",
})

Client Failover Behavior

// Client detects primary down (connection timeout, 503, etc.)
// Automatically tries backup POP

// On backup:
GET /api/entries/123    // ✅ Allowed
PUT /api/entries/123    // ❌ 503 + X-Primary-Location header

Improvements Over Original Design

Original Proposal Improved
Batch polling every 30s Real-time async queue — faster, lower lag
Store just timestamp Add dirty flag — faster recovery, less scanning
Replica rejects all client traffic Read-only allowed — true failover capability
Single replication target Primary + Backup concept — clearer roles

Database Schema Addition

ALTER TABLE entries ADD COLUMN replicated_at INTEGER;      -- NULL = never
ALTER TABLE entries ADD COLUMN replication_dirty BOOLEAN DEFAULT 0;

-- Index for fast "dirty" lookup
CREATE INDEX idx_entries_dirty ON entries(replication_dirty) WHERE replication_dirty = 1;

Code Structure

Commercial-only files:

edition/
├── replication.go         # Core replication logic (queue, worker)
├── replication_queue.go   # SQLite-backed persistent queue
├── replication_client.go  # HTTP client to backup POP
└── replication_handler.go # Backup's /api/replication/apply handler

Modified:

api/handlers.go          # Check if backup mode, reject writes
api/middleware.go        # Detect if backup POP, set context flag

Security Considerations

  1. Inter-POP Auth: mTLS or shared bearer token (rotated daily)
  2. Source Validation: Backup verifies primary is authorized in control plane
  3. No Cascade: Backup never replicates to another backup (prevent loops)
  4. Idempotency: Replication operations are idempotent (safe to retry)

Metrics to Track

  • replication_lag_seconds — Time between primary apply and backup ACK
  • replication_queue_depth — Number of pending entries
  • replication_failures_total — Failed replication attempts
  • replication_fallback_reads — Client reads served from backup

Johan's Design + These Refinements = Production Ready