200 lines
5.2 KiB
Markdown
200 lines
5.2 KiB
Markdown
# Replication Design — Event-Driven Async (Commercial Only)
|
|
|
|
## Core Principle: Trigger on Change, Not Time
|
|
|
|
Polling every 30s is wasteful when vaults may go days without changes.
|
|
Replication fires **immediately** when a write happens, then goes idle.
|
|
|
|
## Architecture
|
|
|
|
### On Primary (Calgary)
|
|
|
|
```
|
|
Client Request → Primary Handler
|
|
↓
|
|
[1] Apply to local DB
|
|
[2] Mark entry dirty (replication_dirty = 1)
|
|
[3] Signal replication worker (non-blocking channel)
|
|
[4] Return success to client (don't wait)
|
|
↓
|
|
Replication Worker (event-driven, wakes on signal)
|
|
↓
|
|
POST dirty entries to Backup /api/replication/apply
|
|
↓
|
|
Clear dirty flag on ACK
|
|
```
|
|
|
|
**No polling. No timer. The worker sleeps until woken.**
|
|
|
|
### Replication Worker
|
|
|
|
```go
|
|
type ReplicationWorker struct {
|
|
db *lib.DB
|
|
config *ReplicationConfig
|
|
signal chan struct{} // Buffered channel (size 1)
|
|
pending map[int64]bool // Dedup in-memory
|
|
mu sync.Mutex
|
|
}
|
|
|
|
func (w *ReplicationWorker) Signal() {
|
|
select {
|
|
case w.signal <- struct{}{}:
|
|
default:
|
|
// Already signaled, worker will pick up all dirty entries
|
|
}
|
|
}
|
|
|
|
func (w *ReplicationWorker) Run(ctx context.Context) {
|
|
for {
|
|
select {
|
|
case <-ctx.Done():
|
|
return
|
|
case <-w.signal:
|
|
w.replicateBatch()
|
|
}
|
|
}
|
|
}
|
|
|
|
func (w *ReplicationWorker) replicateBatch() {
|
|
// Get all dirty entries (could be 1 or many if burst)
|
|
entries, _ := lib.EntryListDirty(w.db, 100)
|
|
if len(entries) == 0 {
|
|
return
|
|
}
|
|
|
|
// POST to backup
|
|
// Retry with backoff on failure
|
|
// Mark replicated on success
|
|
}
|
|
```
|
|
|
|
### Signal Flow
|
|
|
|
```go
|
|
// In CreateEntry, UpdateEntry, DeleteEntry handlers:
|
|
func (h *Handlers) CreateEntry(...) {
|
|
// ... create entry ...
|
|
|
|
// Commercial only: mark dirty and signal replicator
|
|
if edition.Current.Name() == "commercial" {
|
|
lib.EntryMarkDirty(h.db(r), entry.EntryID)
|
|
edition.SignalReplication() // Non-blocking
|
|
}
|
|
|
|
// Return to client immediately
|
|
}
|
|
```
|
|
|
|
### On Backup (Zurich)
|
|
|
|
Same as before: Read-only mode, applies replication pushes, rejects client writes.
|
|
|
|
## Efficiency Gains
|
|
|
|
| Metric | Polling (30s) | Event-Driven |
|
|
|--------|---------------|--------------|
|
|
| CPU wakeups/day | 2,880 | ~number of actual writes |
|
|
| Network requests/day | 2,880 | ~number of actual writes |
|
|
| Egress/day | High (always checking) | Low (only when data changes) |
|
|
| Latency | 0-30s | Immediate |
|
|
|
|
For a vault with 10 writes/day: **288x fewer wakeups.**
|
|
|
|
## Burst Handling
|
|
|
|
If 50 entries change in a burst (e.g., batch import):
|
|
1. All 50 marked dirty
|
|
2. Worker wakes once
|
|
3. Sends all 50 in single batch
|
|
4. Goes back to sleep
|
|
|
|
No 50 separate HTTP requests.
|
|
|
|
## Failure & Retry
|
|
|
|
```go
|
|
func replicateBatch() {
|
|
entries, _ := lib.EntryListDirty(db, 100)
|
|
|
|
for attempt := 0; attempt < maxRetries; attempt++ {
|
|
err := postToBackup(entries)
|
|
if err == nil {
|
|
// Success: clear dirty flags
|
|
for _, e := range entries {
|
|
lib.EntryMarkReplicated(db, e.EntryID)
|
|
}
|
|
return
|
|
}
|
|
|
|
// Failure: entries stay dirty, will be picked up next signal
|
|
// Backoff: 1s, 5s, 25s, 125s...
|
|
time.Sleep(time.Duration(math.Pow(5, attempt)) * time.Second)
|
|
}
|
|
|
|
// Max retries exceeded: alert operator
|
|
edition.Current.AlertOperator(ctx, "replication_failed",
|
|
"Backup unreachable after retries", map[string]any{
|
|
"count": len(entries),
|
|
"last_error": err.Error(),
|
|
})
|
|
}
|
|
```
|
|
|
|
No persistent queue needed - dirty flags in SQLite are the queue.
|
|
|
|
## Code Changes Required
|
|
|
|
### 1. Signal Function (Commercial Only)
|
|
|
|
```go
|
|
// edition/replication.go
|
|
var replicationSignal chan struct{}
|
|
|
|
func SignalReplication() {
|
|
if replicationSignal != nil {
|
|
select {
|
|
case replicationSignal <- struct{}{}:
|
|
default:
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Modified Handlers
|
|
|
|
All write handlers need:
|
|
```go
|
|
if edition.Current.Name() == "commercial" {
|
|
lib.EntryMarkDirty(db, entryID)
|
|
edition.SignalReplication()
|
|
}
|
|
```
|
|
|
|
### 3. Remove Polling
|
|
|
|
Delete the ticker from replication worker. Replace with `<-signal` only.
|
|
|
|
## Resource Usage
|
|
|
|
| Resource | Polling | Event-Driven |
|
|
|----------|---------|--------------|
|
|
| Goroutine | Always running | Running but blocked on channel (idle) |
|
|
| Memory | Minimal | Minimal (just channel + map) |
|
|
| CPU | 2,880 wakeups/day | #writes wakeups/day |
|
|
| Network | 2,880 requests/day | #writes requests/day |
|
|
| SQLite queries | 2,880/day | #writes/day |
|
|
|
|
## Design Notes
|
|
|
|
**No persistent queue needed** - the `replication_dirty` column IS the queue.
|
|
Worker crash? On restart, `EntryListDirty()` finds all pending work.
|
|
|
|
**No timer needed** - Go channel with `select` is the most efficient wait mechanism.
|
|
|
|
**Batching automatic** - Multiple signals while worker is busy? Channel size 1 means worker picks up ALL dirty entries on next iteration, not one-by-one.
|
|
|
|
---
|
|
|
|
**This is the right design for low-resource, low-change vaults.**
|