clavitor/clavis/clavis-vault/SPEC-replication-config.md

280 lines
7.2 KiB
Markdown

# Replication Configuration — Mandatory in Commercial Edition
## Principle: Config File, Not Flags
Commercial Edition requires explicit `replication.yaml` configuration file.
No env vars. No optional flags. Replication is core to the commercial value proposition.
## Config File: `/etc/clavitor/replication.yaml`
```yaml
# Commercial Edition replication configuration
# This file is REQUIRED for commercial builds. Without it, vault refuses to start.
# This POP's identity
pop_id: "calgary-01"
region: "north-america"
role: "primary" # or "backup"
# Backup POP to replicate to (required for primary role)
backup_pop:
id: "zurich-01"
url: "https://zurich-01.clavitor.ai"
auth_token_file: "/etc/clavitor/replication.key"
# Inter-POP authentication
# Token is read from file (not inline for security)
auth:
token_file: "/etc/clavitor/replication.key"
mtls_cert: "/etc/clavitor/replication.crt"
mtls_key: "/etc/clavitor/replication.key"
# Replication behavior (optional, defaults shown)
replication:
batch_size: 100
max_retries: 5
# No interval - event-driven only
# Control plane for failover coordination (optional but recommended)
control_plane:
url: "https://control.clavitor.ai"
token_file: "/etc/clavitor/control-plane.key"
heartbeat_interval: 60 # seconds
```
## Startup Behavior
### Commercial Edition
```go
// main.go commercial path
cfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
log.Fatalf("Commercial edition requires replication.yaml: %v", err)
}
// Validate: primary role requires backup_pop configured
if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
log.Fatalf("Primary POP requires backup_pop.url in replication.yaml")
}
edition.SetCommercialConfig(&edition.CommercialConfig{
ReplicationConfig: cfg,
})
```
**Without config file: vault refuses to start.**
### Community Edition
Ignores replication config entirely. Single-node operation only.
```go
// main.go community path
// No replication config loaded
// No replication worker started
// Works exactly as before
```
## Role-Based Behavior
### Primary POP (e.g., Calgary)
- Accepts client writes
- Replicates to configured backup POP
- Serves normal traffic
### Backup POP (e.g., Zurich)
- Accepts replication pushes from primary
- Rejects client writes with 503
- Serves read-only traffic if promoted
- **Does not replicate further** (prevent cascade)
### Failover (Manual or Control-Plane Managed)
```yaml
# Before failover (Zurich is backup)
role: "backup"
primary_pop:
id: "calgary-01"
url: "https://calgary-01.clavitor.ai"
# After failover (promoted to primary)
role: "primary"
backup_pop:
id: "calgary-01" # Old primary becomes backup
url: "https://calgary-01.clavitor.ai"
```
## Security
### Token Files
```bash
# /etc/clavitor/replication.key
# 256-bit random token, base64 encoded
# Generated at POP provisioning time
chmod 600 /etc/clavitor/replication.key
chown clavitor:clavitor /etc/clavitor/replication.key
```
### mTLS (Optional but Recommended)
```bash
# Each POP has unique client cert signed by internal CA
# Presented to backup POP for authentication
# Generate per-POP cert
openssl req -new -key zurich-01.key -out zurich-01.csr
openssl x509 -req -in zurich-01.csr -CA clavitor-intermediate.crt ...
```
## Why Config File > Env Vars
| Aspect | Environment Variables | Config File |
|--------|----------------------|-------------|
| **Visibility** | Hidden in shell/env | Explicit in `/etc/clavitor/` |
| **Validation** | Runtime checks only | Startup validation with clear errors |
| **Structure** | Flat strings | YAML hierarchy, comments |
| **Secrets** | Exposed to all subprocesses | File permissions control access |
| **Versioning** | Not tracked | Can be versioned per POP |
| **Rotation** | Hard to rotate live | Signal HUP to reload |
## Implementation
### New File: `edition/config.go` (Commercial Only)
```go
//go:build commercial
package edition
type ReplicationConfig struct {
POPID string `yaml:"pop_id"`
Region string `yaml:"region"`
Role string `yaml:"role"` // "primary" or "backup"
BackupPOP struct {
ID string `yaml:"id"`
URL string `yaml:"url"`
AuthTokenFile string `yaml:"auth_token_file"`
} `yaml:"backup_pop"`
Auth struct {
TokenFile string `yaml:"token_file"`
MTLSCert string `yaml:"mtls_cert"`
MTLSKey string `yaml:"mtls_key"`
} `yaml:"auth"`
Replication struct {
BatchSize int `yaml:"batch_size"`
MaxRetries int `yaml:"max_retries"`
} `yaml:"replication"`
}
func LoadReplicationConfig(path string) (*ReplicationConfig, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, err
}
var cfg ReplicationConfig
if err := yaml.Unmarshal(data, &cfg); err != nil {
return nil, err
}
// Validate
if cfg.Role != "primary" && cfg.Role != "backup" {
return nil, fmt.Errorf("role must be 'primary' or 'backup'")
}
if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
return nil, fmt.Errorf("primary role requires backup_pop.url")
}
// Set defaults
if cfg.Replication.BatchSize == 0 {
cfg.Replication.BatchSize = 100
}
return &cfg, nil
}
```
### Modified: `cmd/clavitor/main.go`
Remove replication flags entirely. Replace with:
```go
// Commercial: Load mandatory replication config
if edition.Current.Name() == "commercial" {
replCfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
log.Fatalf("Commercial edition requires /etc/clavitor/replication.yaml: %v", err)
}
edition.SetCommercialConfig(&edition.CommercialConfig{
ReplicationConfig: replCfg,
})
// ... start replication ...
}
```
## Example POP Provisioning
### Calgary (Primary)
```bash
mkdir -p /etc/clavitor
# Generate replication token
cat > /etc/clavitor/replication.key << 'EOF'
clavitor-pop-v1-a1b2c3d4e5f6...
EOF
chmod 600 /etc/clavitor/replication.key
# Create config
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "calgary-01"
region: "north-america"
role: "primary"
backup_pop:
id: "zurich-01"
url: "https://zurich-01.clavitor.ai"
auth_token_file: "/etc/clavitor/replication.key"
auth:
token_file: "/etc/clavitor/replication.key"
EOF
# Start vault
./clavitor-commercial
# Output: Commercial edition: primary POP, replicating to zurich-01
```
### Zurich (Backup)
```bash
mkdir -p /etc/clavitor
# Same token (shared secret for inter-POP auth)
cp calgary-01:/etc/clavitor/replication.key /etc/clavitor/replication.key
# Backup config (no backup_pop, acts as replica)
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "zurich-01"
region: "europe"
role: "backup"
auth:
token_file: "/etc/clavitor/replication.key"
EOF
# Start vault
./clavitor-commercial
# Output: Commercial edition: backup POP, accepting replication from primary
```
## Summary
- **Config file, not env**: `/etc/clavitor/replication.yaml`
- **Mandatory for commercial**: Vault refuses to start without it
- **Explicit role**: Primary or backup, no ambiguity
- **File-based secrets**: `token_file`, `mtls_cert` paths
- **No polling**: Event-driven as designed earlier
- **Control plane optional**: For automated failover