# Replication Design — Active-Passive with Async Sync

## Overview

Primary POP (e.g., Calgary) replicates every write to Backup POP (e.g., Zurich).
Backup serves **read-only** traffic if primary fails.

## Key Principles

1. **Primary owns writes** — Backup never accepts mutations from clients
2. **Same wire format** — Replicate the exact request payload (not re-encoded)
3. **Async, non-blocking** — Primary doesn't wait for backup ACK (queue + retry)
4. **Dirty tracking per entry** — Each entry has `replicated_at` and dirty flag
5. **Read failover only** — Clients read from backup if primary down, but writes fail

## Architecture

### On Primary (Calgary)

```
Client Request → Primary Handler
                      ↓
              [1] Apply to local DB
              [2] Queue for replication (async)
              [3] Return success to client (don't wait for backup)
                      ↓
              Replication Worker (background)
                      ↓
              POST to Backup /api/replication/apply
```

**Queue Structure:**
```go
type ReplicationTask struct {
    EntryID      int64
    RawPayload   []byte      // Original request body (encrypted blob)
    Method       string      // POST/PUT/DELETE
    Timestamp    int64       // When primary applied
    RetryCount   int
    Dirty        bool        // true = not yet confirmed by backup
}
```

**Per-Entry Status (in entries table):**
```sql
replicated_at INTEGER,     -- NULL = never replicated, timestamp = last confirmation
replication_dirty BOOLEAN  -- true = pending replication, false = synced
```

### On Backup (Zurich)

```
POST /api/replication/apply
    ↓
Validate: Is this from an authorized primary POP? (mTLS or shared secret)
    ↓
Apply to local DB (exact same data, including encrypted blobs)
    ↓
Return 200 ACK
```

**Backup rejects client writes:**
```go
if isClientRequest && isWriteOperation {
    return 503, "Write operations not available on backup POP"
}
```

## Failure Scenarios

### 1. Backup Unavailable (Primary Still Up)

- Primary queues replication tasks (in-memory + SQLite for persistence)
- Retries with exponential backoff
- Marks entries as `dirty=true`
- Client operations continue normally
- When backup comes back: bulk sync dirty entries

### 2. Primary Fails (Backup Becomes Active)

- DNS/healthcheck detects primary down
- Clients routed to backup
- **Backup serves reads only**
- Writes return 503 with header: `X-Primary-Location: https://calgary.clavitor.ai`
- Manual intervention required to promote backup to primary

### 3. Split Brain (Both Think They're Primary)

- Prevented by design: Only one POP has "primary" role in control plane
- Backup refuses writes from clients
- If control plane fails: manual failover only

## Replication Endpoint (Backup)

```http
POST /api/replication/apply
Authorization: Bearer {inter-pop-token}
Content-Type: application/json

{
  "source_pop": "calgary-01",
  "entries": [
    {
      "entry_id": "abc123",
      "operation": "create",  // or "update", "delete"
      "encrypted_data": "base64...",
      "timestamp": 1743556800
    }
  ]
}
```

Response:
```json
{
  "acknowledged": ["abc123"],
  "failed": [],
  "already_exists": []  // For conflict detection
}
```

## Audit Log Handling

**Primary:** Logs all operations normally.

**Backup:** Logs its own operations (replication applies) but not client operations.

```go
// On backup, when applying replication:
lib.AuditLog(db, &lib.AuditEvent{
    Action: lib.ActionReplicated,  // Special action type
    EntryID: entryID,
    Title: "replicated from " + sourcePOP,
    Actor: "system:replication",
})
```

## Client Failover Behavior

```go
// Client detects primary down (connection timeout, 503, etc.)
// Automatically tries backup POP

// On backup:
GET /api/entries/123    // ✅ Allowed
PUT /api/entries/123    // ❌ 503 + X-Primary-Location header
```

## Improvements Over Original Design

| Original Proposal | Improved |
|-------------------|----------|
| Batch polling every 30s | **Real-time async queue** — faster, lower lag |
| Store just timestamp | **Add dirty flag** — faster recovery, less scanning |
| Replica rejects all client traffic | **Read-only allowed** — true failover capability |
| Single replication target | **Primary + Backup concept** — clearer roles |

## Database Schema Addition

```sql
ALTER TABLE entries ADD COLUMN replicated_at INTEGER;      -- NULL = never
ALTER TABLE entries ADD COLUMN replication_dirty BOOLEAN DEFAULT 0;

-- Index for fast "dirty" lookup
CREATE INDEX idx_entries_dirty ON entries(replication_dirty) WHERE replication_dirty = 1;
```

## Code Structure

**Commercial-only files:**
```
edition/
├── replication.go         # Core replication logic (queue, worker)
├── replication_queue.go   # SQLite-backed persistent queue
├── replication_client.go  # HTTP client to backup POP
└── replication_handler.go # Backup's /api/replication/apply handler
```

**Modified:**
```
api/handlers.go          # Check if backup mode, reject writes
api/middleware.go        # Detect if backup POP, set context flag
```

## Security Considerations

1. **Inter-POP Auth:** mTLS or shared bearer token (rotated daily)
2. **Source Validation:** Backup verifies primary is authorized in control plane
3. **No Cascade:** Backup never replicates to another backup (prevent loops)
4. **Idempotency:** Replication operations are idempotent (safe to retry)

## Metrics to Track

- `replication_lag_seconds` — Time between primary apply and backup ACK
- `replication_queue_depth` — Number of pending entries
- `replication_failures_total` — Failed replication attempts
- `replication_fallback_reads` — Client reads served from backup

---

**Johan's Design + These Refinements = Production Ready**