202 lines
5.7 KiB
Markdown
202 lines
5.7 KiB
Markdown
# Replication Design — Active-Passive with Async Sync
|
|
|
|
## Overview
|
|
|
|
Primary POP (e.g., Calgary) replicates every write to Backup POP (e.g., Zurich).
|
|
Backup serves **read-only** traffic if primary fails.
|
|
|
|
## Key Principles
|
|
|
|
1. **Primary owns writes** — Backup never accepts mutations from clients
|
|
2. **Same wire format** — Replicate the exact request payload (not re-encoded)
|
|
3. **Async, non-blocking** — Primary doesn't wait for backup ACK (queue + retry)
|
|
4. **Dirty tracking per entry** — Each entry has `replicated_at` and dirty flag
|
|
5. **Read failover only** — Clients read from backup if primary down, but writes fail
|
|
|
|
## Architecture
|
|
|
|
### On Primary (Calgary)
|
|
|
|
```
|
|
Client Request → Primary Handler
|
|
↓
|
|
[1] Apply to local DB
|
|
[2] Queue for replication (async)
|
|
[3] Return success to client (don't wait for backup)
|
|
↓
|
|
Replication Worker (background)
|
|
↓
|
|
POST to Backup /api/replication/apply
|
|
```
|
|
|
|
**Queue Structure:**
|
|
```go
|
|
type ReplicationTask struct {
|
|
EntryID int64
|
|
RawPayload []byte // Original request body (encrypted blob)
|
|
Method string // POST/PUT/DELETE
|
|
Timestamp int64 // When primary applied
|
|
RetryCount int
|
|
Dirty bool // true = not yet confirmed by backup
|
|
}
|
|
```
|
|
|
|
**Per-Entry Status (in entries table):**
|
|
```sql
|
|
replicated_at INTEGER, -- NULL = never replicated, timestamp = last confirmation
|
|
replication_dirty BOOLEAN -- true = pending replication, false = synced
|
|
```
|
|
|
|
### On Backup (Zurich)
|
|
|
|
```
|
|
POST /api/replication/apply
|
|
↓
|
|
Validate: Is this from an authorized primary POP? (mTLS or shared secret)
|
|
↓
|
|
Apply to local DB (exact same data, including encrypted blobs)
|
|
↓
|
|
Return 200 ACK
|
|
```
|
|
|
|
**Backup rejects client writes:**
|
|
```go
|
|
if isClientRequest && isWriteOperation {
|
|
return 503, "Write operations not available on backup POP"
|
|
}
|
|
```
|
|
|
|
## Failure Scenarios
|
|
|
|
### 1. Backup Unavailable (Primary Still Up)
|
|
|
|
- Primary queues replication tasks (in-memory + SQLite for persistence)
|
|
- Retries with exponential backoff
|
|
- Marks entries as `dirty=true`
|
|
- Client operations continue normally
|
|
- When backup comes back: bulk sync dirty entries
|
|
|
|
### 2. Primary Fails (Backup Becomes Active)
|
|
|
|
- DNS/healthcheck detects primary down
|
|
- Clients routed to backup
|
|
- **Backup serves reads only**
|
|
- Writes return 503 with header: `X-Primary-Location: https://calgary.clavitor.ai`
|
|
- Manual intervention required to promote backup to primary
|
|
|
|
### 3. Split Brain (Both Think They're Primary)
|
|
|
|
- Prevented by design: Only one POP has "primary" role in control plane
|
|
- Backup refuses writes from clients
|
|
- If control plane fails: manual failover only
|
|
|
|
## Replication Endpoint (Backup)
|
|
|
|
```http
|
|
POST /api/replication/apply
|
|
Authorization: Bearer {inter-pop-token}
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"source_pop": "calgary-01",
|
|
"entries": [
|
|
{
|
|
"entry_id": "abc123",
|
|
"operation": "create", // or "update", "delete"
|
|
"encrypted_data": "base64...",
|
|
"timestamp": 1743556800
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"acknowledged": ["abc123"],
|
|
"failed": [],
|
|
"already_exists": [] // For conflict detection
|
|
}
|
|
```
|
|
|
|
## Audit Log Handling
|
|
|
|
**Primary:** Logs all operations normally.
|
|
|
|
**Backup:** Logs its own operations (replication applies) but not client operations.
|
|
|
|
```go
|
|
// On backup, when applying replication:
|
|
lib.AuditLog(db, &lib.AuditEvent{
|
|
Action: lib.ActionReplicated, // Special action type
|
|
EntryID: entryID,
|
|
Title: "replicated from " + sourcePOP,
|
|
Actor: "system:replication",
|
|
})
|
|
```
|
|
|
|
## Client Failover Behavior
|
|
|
|
```go
|
|
// Client detects primary down (connection timeout, 503, etc.)
|
|
// Automatically tries backup POP
|
|
|
|
// On backup:
|
|
GET /api/entries/123 // ✅ Allowed
|
|
PUT /api/entries/123 // ❌ 503 + X-Primary-Location header
|
|
```
|
|
|
|
## Improvements Over Original Design
|
|
|
|
| Original Proposal | Improved |
|
|
|-------------------|----------|
|
|
| Batch polling every 30s | **Real-time async queue** — faster, lower lag |
|
|
| Store just timestamp | **Add dirty flag** — faster recovery, less scanning |
|
|
| Replica rejects all client traffic | **Read-only allowed** — true failover capability |
|
|
| Single replication target | **Primary + Backup concept** — clearer roles |
|
|
|
|
## Database Schema Addition
|
|
|
|
```sql
|
|
ALTER TABLE entries ADD COLUMN replicated_at INTEGER; -- NULL = never
|
|
ALTER TABLE entries ADD COLUMN replication_dirty BOOLEAN DEFAULT 0;
|
|
|
|
-- Index for fast "dirty" lookup
|
|
CREATE INDEX idx_entries_dirty ON entries(replication_dirty) WHERE replication_dirty = 1;
|
|
```
|
|
|
|
## Code Structure
|
|
|
|
**Commercial-only files:**
|
|
```
|
|
edition/
|
|
├── replication.go # Core replication logic (queue, worker)
|
|
├── replication_queue.go # SQLite-backed persistent queue
|
|
├── replication_client.go # HTTP client to backup POP
|
|
└── replication_handler.go # Backup's /api/replication/apply handler
|
|
```
|
|
|
|
**Modified:**
|
|
```
|
|
api/handlers.go # Check if backup mode, reject writes
|
|
api/middleware.go # Detect if backup POP, set context flag
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Inter-POP Auth:** mTLS or shared bearer token (rotated daily)
|
|
2. **Source Validation:** Backup verifies primary is authorized in control plane
|
|
3. **No Cascade:** Backup never replicates to another backup (prevent loops)
|
|
4. **Idempotency:** Replication operations are idempotent (safe to retry)
|
|
|
|
## Metrics to Track
|
|
|
|
- `replication_lag_seconds` — Time between primary apply and backup ACK
|
|
- `replication_queue_depth` — Number of pending entries
|
|
- `replication_failures_total` — Failed replication attempts
|
|
- `replication_fallback_reads` — Client reads served from backup
|
|
|
|
---
|
|
|
|
**Johan's Design + These Refinements = Production Ready**
|