clavitor/clavis/clavis-vault/SPEC-replication-config.md

7.2 KiB

Replication Configuration — Mandatory in Commercial Edition

Principle: Config File, Not Flags

Commercial Edition requires explicit replication.yaml configuration file. No env vars. No optional flags. Replication is core to the commercial value proposition.

Config File: /etc/clavitor/replication.yaml

# Commercial Edition replication configuration
# This file is REQUIRED for commercial builds. Without it, vault refuses to start.

# This POP's identity
pop_id: "calgary-01"
region: "north-america"
role: "primary"  # or "backup"

# Backup POP to replicate to (required for primary role)
backup_pop:
  id: "zurich-01"
  url: "https://zurich-01.clavitor.ai"
  auth_token_file: "/etc/clavitor/replication.key"
  
# Inter-POP authentication
# Token is read from file (not inline for security)
auth:
  token_file: "/etc/clavitor/replication.key"
  mtls_cert: "/etc/clavitor/replication.crt"
  mtls_key: "/etc/clavitor/replication.key"

# Replication behavior (optional, defaults shown)
replication:
  batch_size: 100
  max_retries: 5
  # No interval - event-driven only
  
# Control plane for failover coordination (optional but recommended)
control_plane:
  url: "https://control.clavitor.ai"
  token_file: "/etc/clavitor/control-plane.key"
  heartbeat_interval: 60  # seconds

Startup Behavior

Commercial Edition

// main.go commercial path
cfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
if err != nil {
    log.Fatalf("Commercial edition requires replication.yaml: %v", err)
}

// Validate: primary role requires backup_pop configured
if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
    log.Fatalf("Primary POP requires backup_pop.url in replication.yaml")
}

edition.SetCommercialConfig(&edition.CommercialConfig{
    ReplicationConfig: cfg,
})

Without config file: vault refuses to start.

Community Edition

Ignores replication config entirely. Single-node operation only.

// main.go community path
// No replication config loaded
// No replication worker started
// Works exactly as before

Role-Based Behavior

Primary POP (e.g., Calgary)

  • Accepts client writes
  • Replicates to configured backup POP
  • Serves normal traffic

Backup POP (e.g., Zurich)

  • Accepts replication pushes from primary
  • Rejects client writes with 503
  • Serves read-only traffic if promoted
  • Does not replicate further (prevent cascade)

Failover (Manual or Control-Plane Managed)

# Before failover (Zurich is backup)
role: "backup"
primary_pop:
  id: "calgary-01"
  url: "https://calgary-01.clavitor.ai"

# After failover (promoted to primary)
role: "primary"
backup_pop:
  id: "calgary-01"  # Old primary becomes backup
  url: "https://calgary-01.clavitor.ai"

Security

Token Files

# /etc/clavitor/replication.key
# 256-bit random token, base64 encoded
# Generated at POP provisioning time

chmod 600 /etc/clavitor/replication.key
chown clavitor:clavitor /etc/clavitor/replication.key
# Each POP has unique client cert signed by internal CA
# Presented to backup POP for authentication

# Generate per-POP cert
openssl req -new -key zurich-01.key -out zurich-01.csr
openssl x509 -req -in zurich-01.csr -CA clavitor-intermediate.crt ...

Why Config File > Env Vars

Aspect Environment Variables Config File
Visibility Hidden in shell/env Explicit in /etc/clavitor/
Validation Runtime checks only Startup validation with clear errors
Structure Flat strings YAML hierarchy, comments
Secrets Exposed to all subprocesses File permissions control access
Versioning Not tracked Can be versioned per POP
Rotation Hard to rotate live Signal HUP to reload

Implementation

New File: edition/config.go (Commercial Only)

//go:build commercial

package edition

type ReplicationConfig struct {
    POPID     string `yaml:"pop_id"`
    Region    string `yaml:"region"`
    Role      string `yaml:"role"` // "primary" or "backup"
    
    BackupPOP struct {
        ID            string `yaml:"id"`
        URL           string `yaml:"url"`
        AuthTokenFile string `yaml:"auth_token_file"`
    } `yaml:"backup_pop"`
    
    Auth struct {
        TokenFile string `yaml:"token_file"`
        MTLSCert  string `yaml:"mtls_cert"`
        MTLSKey   string `yaml:"mtls_key"`
    } `yaml:"auth"`
    
    Replication struct {
        BatchSize  int `yaml:"batch_size"`
        MaxRetries int `yaml:"max_retries"`
    } `yaml:"replication"`
}

func LoadReplicationConfig(path string) (*ReplicationConfig, error) {
    data, err := os.ReadFile(path)
    if err != nil {
        return nil, err
    }
    var cfg ReplicationConfig
    if err := yaml.Unmarshal(data, &cfg); err != nil {
        return nil, err
    }
    // Validate
    if cfg.Role != "primary" && cfg.Role != "backup" {
        return nil, fmt.Errorf("role must be 'primary' or 'backup'")
    }
    if cfg.Role == "primary" && cfg.BackupPOP.URL == "" {
        return nil, fmt.Errorf("primary role requires backup_pop.url")
    }
    // Set defaults
    if cfg.Replication.BatchSize == 0 {
        cfg.Replication.BatchSize = 100
    }
    return &cfg, nil
}

Modified: cmd/clavitor/main.go

Remove replication flags entirely. Replace with:

// Commercial: Load mandatory replication config
if edition.Current.Name() == "commercial" {
    replCfg, err := edition.LoadReplicationConfig("/etc/clavitor/replication.yaml")
    if err != nil {
        log.Fatalf("Commercial edition requires /etc/clavitor/replication.yaml: %v", err)
    }
    edition.SetCommercialConfig(&edition.CommercialConfig{
        ReplicationConfig: replCfg,
    })
    // ... start replication ...
}

Example POP Provisioning

Calgary (Primary)

mkdir -p /etc/clavitor

# Generate replication token
cat > /etc/clavitor/replication.key << 'EOF'
clavitor-pop-v1-a1b2c3d4e5f6...
EOF
chmod 600 /etc/clavitor/replication.key

# Create config
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "calgary-01"
region: "north-america"
role: "primary"
backup_pop:
  id: "zurich-01"
  url: "https://zurich-01.clavitor.ai"
  auth_token_file: "/etc/clavitor/replication.key"
auth:
  token_file: "/etc/clavitor/replication.key"
EOF

# Start vault
./clavitor-commercial
# Output: Commercial edition: primary POP, replicating to zurich-01

Zurich (Backup)

mkdir -p /etc/clavitor

# Same token (shared secret for inter-POP auth)
cp calgary-01:/etc/clavitor/replication.key /etc/clavitor/replication.key

# Backup config (no backup_pop, acts as replica)
cat > /etc/clavitor/replication.yaml << 'EOF'
pop_id: "zurich-01"
region: "europe"
role: "backup"
auth:
  token_file: "/etc/clavitor/replication.key"
EOF

# Start vault
./clavitor-commercial
# Output: Commercial edition: backup POP, accepting replication from primary

Summary

  • Config file, not env: /etc/clavitor/replication.yaml
  • Mandatory for commercial: Vault refuses to start without it
  • Explicit role: Primary or backup, no ambiguity
  • File-based secrets: token_file, mtls_cert paths
  • No polling: Event-driven as designed earlier
  • Control plane optional: For automated failover