# Replication Design — Event-Driven Async (Commercial Only)

## Core Principle: Trigger on Change, Not Time

Polling every 30s is wasteful when vaults may go days without changes.
Replication fires **immediately** when a write happens, then goes idle.

## Architecture

### On Primary (Calgary)

```
Client Request → Primary Handler
                      ↓
              [1] Apply to local DB
              [2] Mark entry dirty (replication_dirty = 1)
              [3] Signal replication worker (non-blocking channel)
              [4] Return success to client (don't wait)
                      ↓
              Replication Worker (event-driven, wakes on signal)
                      ↓
              POST dirty entries to Backup /api/replication/apply
                      ↓
              Clear dirty flag on ACK
```

**No polling. No timer. The worker sleeps until woken.**

### Replication Worker

```go
type ReplicationWorker struct {
    db       *lib.DB
    config   *ReplicationConfig
    signal   chan struct{}     // Buffered channel (size 1)
    pending  map[int64]bool    // Dedup in-memory
    mu       sync.Mutex
}

func (w *ReplicationWorker) Signal() {
    select {
    case w.signal <- struct{}{}:
    default:
        // Already signaled, worker will pick up all dirty entries
    }
}

func (w *ReplicationWorker) Run(ctx context.Context) {
    for {
        select {
        case <-ctx.Done():
            return
        case <-w.signal:
            w.replicateBatch()
        }
    }
}

func (w *ReplicationWorker) replicateBatch() {
    // Get all dirty entries (could be 1 or many if burst)
    entries, _ := lib.EntryListDirty(w.db, 100)
    if len(entries) == 0 {
        return
    }
    
    // POST to backup
    // Retry with backoff on failure
    // Mark replicated on success
}
```

### Signal Flow

```go
// In CreateEntry, UpdateEntry, DeleteEntry handlers:
func (h *Handlers) CreateEntry(...) {
    // ... create entry ...
    
    // Commercial only: mark dirty and signal replicator
    if edition.Current.Name() == "commercial" {
        lib.EntryMarkDirty(h.db(r), entry.EntryID)
        edition.SignalReplication() // Non-blocking
    }
    
    // Return to client immediately
}
```

### On Backup (Zurich)

Same as before: Read-only mode, applies replication pushes, rejects client writes.

## Efficiency Gains

| Metric | Polling (30s) | Event-Driven |
|--------|---------------|--------------|
| CPU wakeups/day | 2,880 | ~number of actual writes |
| Network requests/day | 2,880 | ~number of actual writes |
| Egress/day | High (always checking) | Low (only when data changes) |
| Latency | 0-30s | Immediate |

For a vault with 10 writes/day: **288x fewer wakeups.**

## Burst Handling

If 50 entries change in a burst (e.g., batch import):
1. All 50 marked dirty
2. Worker wakes once
3. Sends all 50 in single batch
4. Goes back to sleep

No 50 separate HTTP requests.

## Failure & Retry

```go
func replicateBatch() {
    entries, _ := lib.EntryListDirty(db, 100)
    
    for attempt := 0; attempt < maxRetries; attempt++ {
        err := postToBackup(entries)
        if err == nil {
            // Success: clear dirty flags
            for _, e := range entries {
                lib.EntryMarkReplicated(db, e.EntryID)
            }
            return
        }
        
        // Failure: entries stay dirty, will be picked up next signal
        // Backoff: 1s, 5s, 25s, 125s...
        time.Sleep(time.Duration(math.Pow(5, attempt)) * time.Second)
    }
    
    // Max retries exceeded: alert operator
    edition.Current.AlertOperator(ctx, "replication_failed", 
        "Backup unreachable after retries", map[string]any{
            "count": len(entries),
            "last_error": err.Error(),
        })
}
```

No persistent queue needed - dirty flags in SQLite are the queue.

## Code Changes Required

### 1. Signal Function (Commercial Only)

```go
// edition/replication.go
var replicationSignal chan struct{}

func SignalReplication() {
    if replicationSignal != nil {
        select {
        case replicationSignal <- struct{}{}:
        default:
        }
    }
}
```

### 2. Modified Handlers

All write handlers need:
```go
if edition.Current.Name() == "commercial" {
    lib.EntryMarkDirty(db, entryID)
    edition.SignalReplication()
}
```

### 3. Remove Polling

Delete the ticker from replication worker. Replace with `<-signal` only.

## Resource Usage

| Resource | Polling | Event-Driven |
|----------|---------|--------------|
| Goroutine | Always running | Running but blocked on channel (idle) |
| Memory | Minimal | Minimal (just channel + map) |
| CPU | 2,880 wakeups/day | #writes wakeups/day |
| Network | 2,880 requests/day | #writes requests/day |
| SQLite queries | 2,880/day | #writes/day |

## Design Notes

**No persistent queue needed** - the `replication_dirty` column IS the queue.
Worker crash? On restart, `EntryListDirty()` finds all pending work.

**No timer needed** - Go channel with `select` is the most efficient wait mechanism.

**Batching automatic** - Multiple signals while worker is busy? Channel size 1 means worker picks up ALL dirty entries on next iteration, not one-by-one.

---

**This is the right design for low-resource, low-change vaults.**