5.7 KiB

Raw Blame History

Replication Design — Active-Passive with Async Sync

Overview

Primary POP (e.g., Calgary) replicates every write to Backup POP (e.g., Zurich). Backup serves read-only traffic if primary fails.

Key Principles

Primary owns writes — Backup never accepts mutations from clients
Same wire format — Replicate the exact request payload (not re-encoded)
Async, non-blocking — Primary doesn't wait for backup ACK (queue + retry)
Dirty tracking per entry — Each entry has replicated_at and dirty flag
Read failover only — Clients read from backup if primary down, but writes fail

Architecture

On Primary (Calgary)

Client Request → Primary Handler
                      ↓
              [1] Apply to local DB
              [2] Queue for replication (async)
              [3] Return success to client (don't wait for backup)
                      ↓
              Replication Worker (background)
                      ↓
              POST to Backup /api/replication/apply

Queue Structure:

type ReplicationTask struct {
    EntryID      int64
    RawPayload   []byte      // Original request body (encrypted blob)
    Method       string      // POST/PUT/DELETE
    Timestamp    int64       // When primary applied
    RetryCount   int
    Dirty        bool        // true = not yet confirmed by backup
}

Per-Entry Status (in entries table):

replicated_at INTEGER,     -- NULL = never replicated, timestamp = last confirmation
replication_dirty BOOLEAN  -- true = pending replication, false = synced

On Backup (Zurich)

POST /api/replication/apply
    ↓
Validate: Is this from an authorized primary POP? (mTLS or shared secret)
    ↓
Apply to local DB (exact same data, including encrypted blobs)
    ↓
Return 200 ACK

Backup rejects client writes:

if isClientRequest && isWriteOperation {
    return 503, "Write operations not available on backup POP"
}

Failure Scenarios

1. Backup Unavailable (Primary Still Up)

Primary queues replication tasks (in-memory + SQLite for persistence)
Retries with exponential backoff
Marks entries as dirty=true
Client operations continue normally
When backup comes back: bulk sync dirty entries

2. Primary Fails (Backup Becomes Active)

DNS/healthcheck detects primary down
Clients routed to backup
Backup serves reads only
Writes return 503 with header: X-Primary-Location: https://calgary.clavitor.ai
Manual intervention required to promote backup to primary

3. Split Brain (Both Think They're Primary)

Prevented by design: Only one POP has "primary" role in control plane
Backup refuses writes from clients
If control plane fails: manual failover only

Replication Endpoint (Backup)

POST /api/replication/apply
Authorization: Bearer {inter-pop-token}
Content-Type: application/json

{
  "source_pop": "calgary-01",
  "entries": [
    {
      "entry_id": "abc123",
      "operation": "create",  // or "update", "delete"
      "encrypted_data": "base64...",
      "timestamp": 1743556800
    }
  ]
}

Response:

{
  "acknowledged": ["abc123"],
  "failed": [],
  "already_exists": []  // For conflict detection
}

Audit Log Handling

Primary: Logs all operations normally.

Backup: Logs its own operations (replication applies) but not client operations.

// On backup, when applying replication:
lib.AuditLog(db, &lib.AuditEvent{
    Action: lib.ActionReplicated,  // Special action type
    EntryID: entryID,
    Title: "replicated from " + sourcePOP,
    Actor: "system:replication",
})

Client Failover Behavior

// Client detects primary down (connection timeout, 503, etc.)
// Automatically tries backup POP

// On backup:
GET /api/entries/123    // ✅ Allowed
PUT /api/entries/123    // ❌ 503 + X-Primary-Location header

Improvements Over Original Design

Original Proposal	Improved
Batch polling every 30s	Real-time async queue — faster, lower lag
Store just timestamp	Add dirty flag — faster recovery, less scanning
Replica rejects all client traffic	Read-only allowed — true failover capability
Single replication target	Primary + Backup concept — clearer roles

Database Schema Addition

ALTER TABLE entries ADD COLUMN replicated_at INTEGER;      -- NULL = never
ALTER TABLE entries ADD COLUMN replication_dirty BOOLEAN DEFAULT 0;

-- Index for fast "dirty" lookup
CREATE INDEX idx_entries_dirty ON entries(replication_dirty) WHERE replication_dirty = 1;

Code Structure

Commercial-only files:

edition/
├── replication.go         # Core replication logic (queue, worker)
├── replication_queue.go   # SQLite-backed persistent queue
├── replication_client.go  # HTTP client to backup POP
└── replication_handler.go # Backup's /api/replication/apply handler

Modified:

api/handlers.go          # Check if backup mode, reject writes
api/middleware.go        # Detect if backup POP, set context flag

Security Considerations

Inter-POP Auth: mTLS or shared bearer token (rotated daily)
Source Validation: Backup verifies primary is authorized in control plane
No Cascade: Backup never replicates to another backup (prevent loops)
Idempotency: Replication operations are idempotent (safe to retry)

Metrics to Track

replication_lag_seconds — Time between primary apply and backup ACK
replication_queue_depth — Number of pending entries
replication_failures_total — Failed replication attempts
replication_fallback_reads — Client reads served from backup

Johan's Design + These Refinements = Production Ready

5.7 KiB Raw Blame History