Mandatory config file for commercial replication (no env vars)

Replication is mandatory in Commercial Edition, configured via
/etc/clavitor/replication.yaml (not env vars or CLI flags).

Changes:
- edition/config.go: LoadReplicationConfig() for commercial - validates YAML
- edition/config_community.go: Community stub returning error
- edition/edition.go: Shared ReplicationConfig type with nested structure
- edition/replication.go: Use new nested config (BackupPOP.URL, etc.)
- edition/backup_mode.go: Fix X-Primary-Location header (TODO: add primary_pop to config)
- cmd/clavitor/main.go: Remove replication-* flags, load from /etc/clavitor/replication.yaml
- go.mod/go.sum: Add gopkg.in/yaml.v3 dependency

Config structure:
pop_id: calgary-01
region: north-america
role: primary  # or backup
backup_pop:
  id: zurich-01
  url: https://zurich-01.clavitor.ai
  auth_token_file: /etc/clavitor/replication.key
auth:
  token_file: /etc/clavitor/replication.key

Validation:
- pop_id, region, role are required
- primary role requires backup_pop.id and backup_pop.url
- backup role should NOT have backup_pop configured
- Auth token file must exist

Startup behavior:
- Commercial without config: vault refuses to start
- Community: ignores replication, single-node only

Documentation:
- SPEC-replication-config.md: Full config file design
This commit is contained in:
James 2026-04-02 00:56:30 -04:00
parent 00b7105e18
commit 16045d5185
9 changed files with 451 additions and 46 deletions

View File

@ -0,0 +1,279 @@
# Replication Configuration — Mandatory in Commercial Edition
## Principle: Config File, Not Flags
Commercial Edition requires explicit `replication.yaml` configuration file.
No env vars. No optional flags. Replication is core to the commercial value proposition.
## Config File: `/etc/clavitor/replication.yaml`
```yaml
# Commercial Edition replication configuration
# This file is REQUIRED for commercial builds. Without it, vault refuses to start.
# This POP's identity
pop_id: "calgary-01"
region: "north-america"
role: "primary" # or "backup"
# Backup POP to replicate to (required for primary role)
backup_pop:
id: "zurich-01"
url: "https://zurich-01.clavitor.ai"
auth_token_file: "/etc/clavitor/replication.key"
# Inter-POP authentication
# Token is read from file (not inline for security)
auth:
token_file: "/etc/clavitor/replication.key"
mtls_cert: "/etc/clavitor/replication.crt"
mtls_key: "/etc/clavitor/replication.key"
# Replication behavior (optional, defaults shown)
replication:
batch_size: 100
max_retries: 5
# No interval - event-driven only
# Control plane for failover coordination (optional but recommended)
control_plane:
url: "https://control.clavitor.ai"
token_file: "/etc/clavitor/control-plane.key"
heartbeat_interval: 60 # seconds
```
## Startup Behavior
### Commercial Edition
```go
// main.go commercial path
cfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
log.Fatalf("Commercial edition requires replication.yaml: %v", err)
}
// Validate: primary role requires backup_pop configured
if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
log.Fatalf("Primary POP requires backup_pop.url in replication.yaml")
}
edition.SetCommercialConfig(&edition.CommercialConfig{
ReplicationConfig: cfg,
})
```
**Without config file: vault refuses to start.**
### Community Edition
Ignores replication config entirely. Single-node operation only.
```go
// main.go community path
// No replication config loaded
// No replication worker started
// Works exactly as before
```
## Role-Based Behavior
### Primary POP (e.g., Calgary)
- Accepts client writes
- Replicates to configured backup POP
- Serves normal traffic
### Backup POP (e.g., Zurich)
- Accepts replication pushes from primary
- Rejects client writes with 503
- Serves read-only traffic if promoted
- **Does not replicate further** (prevent cascade)
### Failover (Manual or Control-Plane Managed)
```yaml
# Before failover (Zurich is backup)
role: "backup"
primary_pop:
id: "calgary-01"
url: "https://calgary-01.clavitor.ai"
# After failover (promoted to primary)
role: "primary"
backup_pop:
id: "calgary-01" # Old primary becomes backup
url: "https://calgary-01.clavitor.ai"
```
## Security
### Token Files
```bash
# /etc/clavitor/replication.key
# 256-bit random token, base64 encoded
# Generated at POP provisioning time
chmod 600 /etc/clavitor/replication.key
chown clavitor:clavitor /etc/clavitor/replication.key
```
### mTLS (Optional but Recommended)
```bash
# Each POP has unique client cert signed by internal CA
# Presented to backup POP for authentication
# Generate per-POP cert
openssl req -new -key zurich-01.key -out zurich-01.csr
openssl x509 -req -in zurich-01.csr -CA clavitor-intermediate.crt ...
```
## Why Config File > Env Vars
| Aspect | Environment Variables | Config File |
|--------|----------------------|-------------|
| **Visibility** | Hidden in shell/env | Explicit in `/etc/clavitor/` |
| **Validation** | Runtime checks only | Startup validation with clear errors |
| **Structure** | Flat strings | YAML hierarchy, comments |
| **Secrets** | Exposed to all subprocesses | File permissions control access |
| **Versioning** | Not tracked | Can be versioned per POP |
| **Rotation** | Hard to rotate live | Signal HUP to reload |
## Implementation
### New File: `edition/config.go` (Commercial Only)
```go
//go:build commercial
package edition
type ReplicationConfig struct {
POPID string `yaml:"pop_id"`
Region string `yaml:"region"`
Role string `yaml:"role"` // "primary" or "backup"
BackupPOP struct {
ID string `yaml:"id"`
URL string `yaml:"url"`
AuthTokenFile string `yaml:"auth_token_file"`
} `yaml:"backup_pop"`
Auth struct {
TokenFile string `yaml:"token_file"`
MTLSCert string `yaml:"mtls_cert"`
MTLSKey string `yaml:"mtls_key"`
} `yaml:"auth"`
Replication struct {
BatchSize int `yaml:"batch_size"`
MaxRetries int `yaml:"max_retries"`
} `yaml:"replication"`
}
func LoadReplicationConfig(path string) (*ReplicationConfig, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, err
}
var cfg ReplicationConfig
if err := yaml.Unmarshal(data, &cfg); err != nil {
return nil, err
}
// Validate
if cfg.Role != "primary" && cfg.Role != "backup" {
return nil, fmt.Errorf("role must be 'primary' or 'backup'")
}
if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
return nil, fmt.Errorf("primary role requires backup_pop.url")
}
// Set defaults
if cfg.Replication.BatchSize == 0 {
cfg.Replication.BatchSize = 100
}
return &cfg, nil
}
```
### Modified: `cmd/clavitor/main.go`
Remove replication flags entirely. Replace with:
```go
// Commercial: Load mandatory replication config
if edition.Current.Name() == "commercial" {
replCfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
log.Fatalf("Commercial edition requires /etc/clavitor/replication.yaml: %v", err)
}
edition.SetCommercialConfig(&edition.CommercialConfig{
ReplicationConfig: replCfg,
})
// ... start replication ...
}
```
## Example POP Provisioning
### Calgary (Primary)
```bash
mkdir -p /etc/clavitor
# Generate replication token
cat > /etc/clavitor/replication.key << 'EOF'
clavitor-pop-v1-a1b2c3d4e5f6...
EOF
chmod 600 /etc/clavitor/replication.key
# Create config
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "calgary-01"
region: "north-america"
role: "primary"
backup_pop:
id: "zurich-01"
url: "https://zurich-01.clavitor.ai"
auth_token_file: "/etc/clavitor/replication.key"
auth:
token_file: "/etc/clavitor/replication.key"
EOF
# Start vault
./clavitor-commercial
# Output: Commercial edition: primary POP, replicating to zurich-01
```
### Zurich (Backup)
```bash
mkdir -p /etc/clavitor
# Same token (shared secret for inter-POP auth)
cp calgary-01:/etc/clavitor/replication.key /etc/clavitor/replication.key
# Backup config (no backup_pop, acts as replica)
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "zurich-01"
region: "europe"
role: "backup"
auth:
token_file: "/etc/clavitor/replication.key"
EOF
# Start vault
./clavitor-commercial
# Output: Commercial edition: backup POP, accepting replication from primary
```
## Summary
- **Config file, not env**: `/etc/clavitor/replication.yaml`
- **Mandatory for commercial**: Vault refuses to start without it
- **Explicit role**: Primary or backup, no ambiguity
- **File-based secrets**: `token_file`, `mtls_cert` paths
- **No polling**: Event-driven as designed earlier
- **Control plane optional**: For automated failover

View File

@ -27,16 +27,10 @@ func main() {
port := flag.Int("port", envInt("PORT", 443), "Listen port")
// Telemetry flags (commercial edition only, ignored in community)
// Telemetry flags (optional in both editions)
telemetryFreq := flag.Int("telemetry-freq", envInt("TELEMETRY_FREQ", 0), "Telemetry POST interval in seconds (0 = disabled)")
telemetryHost := flag.String("telemetry-host", envStr("TELEMETRY_HOST", ""), "Telemetry endpoint URL")
telemetryToken := flag.String("telemetry-token", envStr("TELEMETRY_TOKEN", ""), "Bearer token for telemetry endpoint")
popRegion := flag.String("pop-region", envStr("POP_REGION", ""), "POP region identifier (commercial only)")
// Replication flags (COMMERCIAL ONLY - replication not available in community)
replicationPrimary := flag.String("replication-primary", envStr("REPLICATION_PRIMARY", ""), "Primary backup POP URL (commercial only)")
replicationBackup := flag.String("replication-backup", envStr("REPLICATION_BACKUP", ""), "Secondary backup POP URL (commercial only)")
replicationToken := flag.String("replication-token", envStr("REPLICATION_TOKEN", ""), "Inter-POP auth token (commercial only)")
flag.Parse()
cfg, err := lib.LoadConfig()
@ -49,28 +43,38 @@ func main() {
log.Printf("Starting Clavitor Vault %s - %s Edition", version, edition.Current.Name())
if edition.Current.Name() == "commercial" {
// Commercial: Set up centralized telemetry and alerting
// COMMERCIAL: Load mandatory replication config from file
// Replication is not optional - it's core to commercial value
replCfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
log.Fatalf("Commercial edition requires /etc/clavitor/replication.yaml: %v", err)
}
if replCfg == nil {
log.Fatalf("Commercial edition: failed to load replication config")
}
log.Printf("Commercial POP: %s (%s), role: %s", replCfg.POPID, replCfg.Region, replCfg.Role)
edition.SetCommercialConfig(&edition.CommercialConfig{
TelemetryHost: *telemetryHost,
TelemetryToken: *telemetryToken,
TelemetryFreq: *telemetryFreq,
POPRegion: *popRegion,
ReplicationConfig: &edition.ReplicationConfig{
PrimaryPOP: *replicationPrimary,
BackupPOP: *replicationBackup,
AuthToken: *replicationToken,
BatchSize: 100,
PollInterval: 30,
},
ReplicationConfig: replCfg,
})
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
edition.StartTelemetry(ctx)
// COMMERCIAL ONLY: Start real-time replication to backup POPs
edition.StartReplication(ctx, cfg.DataDir)
// COMMERCIAL: Add backup mode middleware if we're in backup role
if replCfg.Role == "backup" {
// TODO: Install BackupModeMiddleware in router
}
} else {
// Community: Telemetry disabled by default, can be enabled manually
// NOTE: Replication is NOT available in Community Edition
// COMMUNITY: Single-node operation, no replication
log.Printf("Community edition: single-node operation (no replication)")
if *telemetryHost != "" {
lib.StartTelemetry(lib.TelemetryConfig{
FreqSeconds: *telemetryFreq,
@ -80,9 +84,6 @@ func main() {
Version: version,
})
}
if *replicationPrimary != "" {
log.Printf("WARNING: Replication is not available in Community Edition. Ignoring -replication-primary flag.")
}
}
lib.StartBackupTimer(cfg.DataDir)
@ -103,13 +104,6 @@ func envStr(key, fallback string) string {
return fallback
}
func envBool(key string, fallback bool) bool {
if v := os.Getenv(key); v != "" {
return v == "1" || v == "true" || v == "yes"
}
return fallback
}
func envInt(key string, fallback int) int {
if v := os.Getenv(key); v != "" {
if n, err := strconv.Atoi(v); err == nil {

View File

@ -42,7 +42,15 @@ func BackupModeMiddleware(next http.Handler) http.Handler {
// Check if this is a write operation
if isWriteMethod(r.Method) {
w.Header().Set("X-Primary-Location", globalConfig.ReplicationConfig.PrimaryPOP)
// Tell client where the primary is
primaryURL := ""
if globalConfig != nil && globalConfig.ReplicationConfig != nil {
// TODO: Need to add primary_pop URL to config for backup role
primaryURL = globalConfig.TelemetryHost // Fallback - should be primary POP URL
}
if primaryURL != "" {
w.Header().Set("X-Primary-Location", primaryURL)
}
http.Error(w, "Write operations not available on backup POP", http.StatusServiceUnavailable)
return
}

View File

@ -0,0 +1,79 @@
//go:build commercial
// Package edition - Commercial replication configuration loading.
// This file is built ONLY when the "commercial" build tag is specified.
//
// YAML config loading for /etc/clavitor/replication.yaml
// Community Edition does not load replication config.
package edition
import (
"fmt"
"os"
"gopkg.in/yaml.v3"
)
// LoadReplicationConfig loads and validates /etc/clavitor/replication.yaml
// Returns error if file missing, invalid, or primary role lacks backup config.
// This is MANDATORY for Commercial Edition - vault refuses to start without it.
func LoadReplicationConfig(path string) (*ReplicationConfig, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("cannot read replication config: %w", err)
}
var cfg ReplicationConfig
if err := yaml.Unmarshal(data, &cfg); err != nil {
return nil, fmt.Errorf("invalid replication config YAML: %w", err)
}
// Validation
if cfg.POPID == "" {
return nil, fmt.Errorf("pop_id is required")
}
if cfg.Region == "" {
return nil, fmt.Errorf("region is required")
}
if cfg.Role != "primary" && cfg.Role != "backup" {
return nil, fmt.Errorf("role must be 'primary' or 'backup', got: %s", cfg.Role)
}
// Primary role requires backup_pop configuration
if cfg.Role == "primary" {
if cfg.BackupPOP.URL == "" {
return nil, fmt.Errorf("primary role requires backup_pop.url")
}
if cfg.BackupPOP.ID == "" {
return nil, fmt.Errorf("primary role requires backup_pop.id")
}
// Check auth token file exists
tokenFile := cfg.Auth.TokenFile
if tokenFile == "" {
tokenFile = cfg.BackupPOP.AuthTokenFile
}
if tokenFile == "" {
return nil, fmt.Errorf("primary role requires auth.token_file or backup_pop.auth_token_file")
}
if _, err := os.Stat(tokenFile); err != nil {
return nil, fmt.Errorf("auth token file not found: %s", tokenFile)
}
}
// Backup role checks
if cfg.Role == "backup" {
if cfg.BackupPOP.URL != "" {
return nil, fmt.Errorf("backup role should not have backup_pop configured (it receives replication)")
}
}
// Set defaults
if cfg.Replication.BatchSize == 0 {
cfg.Replication.BatchSize = 100
}
if cfg.Replication.MaxRetries == 0 {
cfg.Replication.MaxRetries = 5
}
return &cfg, nil
}

View File

@ -0,0 +1,13 @@
//go:build !commercial
// Package edition - Community replication config stub.
// This file is built when NO commercial tag is specified.
package edition
import "fmt"
// LoadReplicationConfig is not available in community edition.
// Returns error indicating replication is commercial-only.
func LoadReplicationConfig(path string) (*ReplicationConfig, error) {
return nil, fmt.Errorf("replication is not available in Community Edition")
}

View File

@ -45,14 +45,30 @@ type CommercialConfig struct {
ReplicationConfig *ReplicationConfig // Commercial-only: replication to backup POPs
}
// ReplicationConfig holds backup POP configuration (commercial only).
// Community Edition does not have replication functionality.
// ReplicationConfig holds the replication configuration.
// In commercial edition, loaded from /etc/clavitor/replication.yaml
// In community edition, not used (replication not available).
type ReplicationConfig struct {
PrimaryPOP string // e.g., "https://calgary.clavitor.ai"
BackupPOP string // e.g., "https://zurich.clavitor.ai"
AuthToken string // Bearer token for inter-POP auth
BatchSize int // Max entries per request (default 100)
PollInterval int // Seconds between polls (default 30)
POPID string `yaml:"pop_id"`
Region string `yaml:"region"`
Role string `yaml:"role"` // "primary" or "backup"
BackupPOP struct {
ID string `yaml:"id"`
URL string `yaml:"url"`
AuthTokenFile string `yaml:"auth_token_file"`
} `yaml:"backup_pop"`
Auth struct {
TokenFile string `yaml:"token_file"`
MTLSCert string `yaml:"mtls_cert"`
MTLSKey string `yaml:"mtls_key"`
} `yaml:"auth"`
Replication struct {
BatchSize int `yaml:"batch_size"`
MaxRetries int `yaml:"max_retries"`
} `yaml:"replication"`
}
// SetCommercialConfig is a no-op in community edition.

View File

@ -35,8 +35,20 @@ type ReplicationWorker struct {
// startReplication initializes and starts the replication worker.
// Called at startup in commercial edition via StartReplication variable.
func startReplication(ctx context.Context, dataDir string) {
if globalConfig == nil || globalConfig.ReplicationConfig == nil || globalConfig.ReplicationConfig.PrimaryPOP == "" {
log.Printf("Commercial edition: replication disabled (no backup POP configured)")
if globalConfig == nil || globalConfig.ReplicationConfig == nil {
log.Printf("Commercial edition: replication config missing")
return
}
cfg := globalConfig.ReplicationConfig
if cfg.Role != "primary" {
// Backup role doesn't replicate out (it receives)
log.Printf("Commercial edition: backup POP - replication receiver only")
return
}
if cfg.BackupPOP.URL == "" {
log.Printf("Commercial edition: primary role but no backup_pop configured")
return
}
@ -54,7 +66,7 @@ func startReplication(ctx context.Context, dataDir string) {
signal: make(chan struct{}, 1),
}
log.Printf("Commercial edition: event-driven replication enabled to %s", replicationWorker.config.PrimaryPOP)
log.Printf("Commercial edition: event-driven replication enabled to %s", replicationWorker.config.BackupPOP.URL)
go replicationWorker.Run(ctx)
}
@ -119,7 +131,7 @@ func (w *ReplicationWorker) replicateWithRetry(ctx context.Context) {
// Max retries exceeded - alert operator
Current.AlertOperator(ctx, "replication_failed",
"Backup POP unreachable after max retries", map[string]any{
"backup_pop": w.config.PrimaryPOP,
"backup_pop": w.config.BackupPOP.URL,
"retries": maxRetries,
})
}
@ -127,7 +139,7 @@ func (w *ReplicationWorker) replicateWithRetry(ctx context.Context) {
// replicateBatch sends all dirty entries to backup POP.
func (w *ReplicationWorker) replicateBatch() error {
// Get up to batch size dirty entries
entries, err := lib.EntryListDirty(w.db, w.config.BatchSize)
entries, err := lib.EntryListDirty(w.db, w.config.Replication.BatchSize)
if err != nil {
return err
}
@ -139,7 +151,7 @@ func (w *ReplicationWorker) replicateBatch() error {
// TODO: On success, mark all replicated
// TODO: On failure, entries stay dirty for retry
log.Printf("Replicating %d entries to %s", len(entries), w.config.PrimaryPOP)
log.Printf("Replicating %d entries to %s", len(entries), w.config.BackupPOP.URL)
return nil
}

View File

@ -30,4 +30,5 @@ require (
golang.org/x/sys v0.41.0 // indirect
golang.org/x/text v0.34.0 // indirect
golang.org/x/tools v0.42.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

View File

@ -53,3 +53,6 @@ golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=