clavitor/docs/INFRASTRUCTURE.md

296 lines
11 KiB
Markdown

# Clavitor — Global Infrastructure Plan
*Created 2026-03-01 · Updated 2026-03-02*
---
## Strategy: Ontmoedigende Voorsprong
The AI agent market is exploding. OpenClaw is commonplace, Claude had 500K App Store downloads in a single day. Every developer running agents has a credential problem — and nobody is solving it with field-level AI visibility and two-tier encryption.
The goal is to present Clavitor as the global standard from day one. When a competitor evaluates the space, they should see infrastructure everywhere and think "we can't improve on that anymore." Like Google in search. The map matters more than the capacity — 16 nodes across 6 continents signals infrastructure, not a side project.
Each node runs a single Go binary + SQLite. Minimal resource requirements. Nodes are independent — no replication between regions. A user's vault lives on one instance. Scale up individual nodes only when demand justifies it.
Budget: $100/month. Deploy mini-nodes, upgrade individual nodes only when demand justifies.
---
## Deployment Map
### Existing Infrastructure (Hostkey)
| City | Provider | Role | Cost |
|------|----------|------|------|
| Amsterdam | Hostkey | EU West, Benelux, Nordics | existing |
| Zurich | Hostkey | SOC hub, Switzerland, DACH backup | existing |
| Dubai | Hostkey | Gulf states, Middle East | ~$5-8/mo |
### New Infrastructure (Vultr)
All Vultr nodes: VX1 tier — 1 vCPU, 512 MB RAM, 10 GB SSD, 0.5 TB bandwidth @ $2.50/mo.
| # | City | Region | Covers |
|---|------|--------|--------|
| 1 | New Jersey | US East | East coast, finance, enterprise |
| 2 | Silicon Valley | US West | Startups, AI companies |
| 3 | Dallas | US Central | Middle US, gaming corridor |
| 4 | London | UK | UK dev market |
| 5 | Frankfurt | EU Central | DACH, central Europe |
| 6 | Warsaw | EU East | Eastern Europe, Balkans, Turkey corridor |
| 7 | Tokyo | Asia East | Japan, China-facing (southern) |
| 8 | Seoul | Asia East | Korea, China-facing (northern) |
| 9 | Mumbai | South Asia | India (1.4B people) |
| 10 | São Paulo | LATAM | South America |
| 11 | Sydney | Oceania | Australia, New Zealand |
| 12 | Johannesburg | Africa | Africa (nobody else is there) |
| 13 | Tel Aviv | Middle East | Eastern Mediterranean, Israel |
**Note:** Hostkey covers Netherlands, Germany, Finland, Iceland, UK, Turkey, USA, Dubai, Israel — significant overlap with Vultr. Consider consolidating to Hostkey where they offer competitive mini-VPS pricing (existing relationship, single invoice). Vultr only for gaps: Tokyo, Seoul, Mumbai, São Paulo, Sydney, Johannesburg, Warsaw, Dallas, Silicon Valley.
---
## Coverage Summary
- **16 nodes** across **6 continents**
- Every major economic zone covered
- No spot on earth more than ~100ms from a Clavitor node
### Cost
| Provider | Nodes | Monthly |
|----------|-------|---------|
| Hostkey (existing) | 2 | $0 (already paid) |
| Hostkey (Dubai) | 1 | ~$5-8 |
| Vultr | 13 | $32.50 |
| **Total** | **16** | **~$40/mo** |
Remaining ~$60/mo reserved for upgrading nodes that see traction.
### Upgrade Path
When a node outgrows $2.50 tier:
1. **$6/mo** — 1 vCPU, 1 GB RAM, 25 GB SSD, 1 TB bandwidth
2. **$12/mo** — 2 vCPU, 2 GB RAM, 50 GB SSD, 2 TB bandwidth
3. Beyond: evaluate dedicated or move to Hostkey dedicated in that region
---
## Node Stack
### OS: NixOS
No Ubuntu. No Alpine. NixOS makes every node a deterministic clone of a single config file in the repo.
- **Declarative**: One `configuration.nix` defines the entire node — OS, packages, services, firewall, users, TLS. Checked into the clavitor repo. Every node identical by definition.
- **Reproducible**: No drift. The system IS the config.
- **Rollback**: Atomic upgrades. `nixos-rebuild switch --rollback` instantly restores previous state.
- **Agent-friendly**: OC pushes config, runs one command. Node converges or doesn't. No imperative state tracking.
- **Hostile to attackers**: Read-only filesystem, no stray tooling, no package manager that works as attacker expects. Break in and find: single Go binary, encrypted SQLite file, nothing else. L2 fields cannot be decrypted — key doesn't exist on server.
Footprint: ~500 MB disk, ~60 MB RAM idle. On 10 GB / 512 MB box, plenty of room.
Deploy via `nixos-infect` (converts fresh Debian on Vultr to NixOS in-place).
Nix store maintenance: keep 2 generations max, periodic `nix-collect-garbage`. Each rebuild barely adds to the store — it's one Go binary + minimal system packages. Non-issue on these nodes.
### No Caddy. No Cloudflare Proxy.
Clavitor is a password vault. Routing all traffic through a third-party proxy (Cloudflare) defeats the trust model. Cloudflare DNS only, no proxying.
Caddy was considered for TLS termination, but Go's built-in `autocert` (`golang.org/x/crypto/acme/autocert`) handles Let's Encrypt natively — about 10 lines of code. This eliminates Caddy entirely (~40 MB binary, ~30-40 MB RAM). The Go binary terminates TLS itself.
Why this works (and won't fail like at Kaseya): 16 domains all under `*.clavitor.com` — we control DNS. No customer domains, no proxy chains, no Windows cert stores. Let's Encrypt rate limit is 50 certs/week — we need 16. Renewal is automatic, in-process, at 30 days before expiry.
### Stack Per Node
```
[Vultr/Hostkey VPS — NixOS]
|
sshd (WireGuard only — no public port 22)
|
clavitor binary (Go, ~15 MB, :80 + :443)
|
clavitor.db (SQLite + WAL)
```
Two processes. Nothing else installed, nothing else running.
---
## Network & Access
### WireGuard Hub-and-Spoke
All management access via WireGuard mesh. No public SSH port on any node.
- **Hub**: Zurich (SOC) — `10.84.0.1/24`, listens on UDP 51820
- **Spokes**: All 15 other nodes — unique `10.84.0.x` address, initiate connection to hub
- SSH binds to WireGuard interface only (`10.84.0.x:22`)
- Public internet sees only ports 80 and 443
Spoke NixOS config (example for Tokyo):
```nix
networking.wireguard.interfaces.wg0 = {
ips = [ "10.84.0.2/24" ];
privateKeyFile = "/etc/wireguard/private.key";
peers = [{
publicKey = "zurich-pub-key...";
allowedIPs = [ "10.84.0.0/24" ];
endpoint = "zurich.clavitor.com:51820";
persistentKeepalive = 25;
}];
};
services.openssh = {
enable = true;
listenAddresses = [{ addr = "10.84.0.2"; port = 22; }];
};
```
Overhead: WireGuard is a kernel module. Zero processes, ~0 MB RAM.
Key management: 16 key pairs (one per node), generated by `wg genkey`. Add/remove node = update Zurich peer list + rebuild. Five minutes.
### Break-Glass SSH
If Zurich is completely down, SOC loses WireGuard access to all nodes. Nodes keep serving customers (public 443 still works), but no management access.
Break-glass: Emergency SSH key with access restricted to `jongsma.me` IP. Disabled in normal operation — enable via Vultr/Hostkey console if Zurich is unrecoverable.
---
## Telemetry & Monitoring
### Push-to-Kuma Model
Nodes push status to Kuma (running in Zurich/SOC). No inbound metrics ports. No scraping. No Prometheus. No node_exporter.
```
clavitor (every 30s) ──POST──> https://kuma.zurich/api/push/xxxxx
|
missing 2 posts = SEV2
missing ~5 min = SEV1
```
Clavitor binary reads its own `/proc/meminfo`, `/proc/loadavg`, disk stats — trivial in Go (`runtime.MemStats` + few file reads) — and pushes JSON to Kuma. No extra software on node.
### Metrics Payload
```json
{
"node": "tokyo",
"ts": 1709312400,
"ram_mb": 142,
"ram_pct": 27.7,
"disk_mb": 3200,
"disk_pct": 31.2,
"cpu_pct": 2.1,
"db_size_mb": 12,
"db_integrity": "ok",
"active_sessions": 3,
"req_1h": 847,
"err_1h": 2,
"cert_days_remaining": 62,
"nix_gen": 2,
"uptime_s": 864000
}
```
Key metric: `cert_days_remaining`. If autocert silently fails renewal, this trends toward zero — visible before expiry.
---
## SOC Operations (Zurich)
Zurich is the SOC hub. Kuma runs here. WireGuard hub here. All management flows through Zurich.
### Routine (automated/scheduled)
- **Kuma monitoring**: Push monitors for all 16 nodes, SEV2/SEV1 escalation on missed heartbeats
- **NixOS updates**: Weekly `nixos-rebuild switch` across all nodes via WireGuard SSH
- **Nix garbage collection**: Weekly, keep 2 generations max
- **SQLite integrity**: Periodic `PRAGMA integrity_check` on vault DBs
- **Cert monitoring**: Watch `cert_days_remaining` in Kuma payload
### Reactive (on alert)
- **Node down**: Check Kuma, SSH via WireGuard, diagnose. If unrecoverable, reprovision from `configuration.nix`.
- **Disk pressure**: Nix garbage collection, check DB growth, upgrade tier if needed
- **Anomaly detection**: Unusual API patterns (credential stuffing, brute force) visible in `err_1h` metric
- **Binary deploy**: Push new Clavitor binary across nodes, rolling deploy, verify health after each
### Deployment
- Node config is single `configuration.nix` per node (templated), checked into clavitor repo
- Go binary cross-compiled with `CGO_ENABLED=0` or musl target (for SQLite on NixOS)
- Deploy: SCP binary + push config via WireGuard SSH, `nixos-rebuild switch` — atomic, rollback on failure
- DNS-level failover: route away from unhealthy nodes
---
## Gaps and Future Considerations
- **Istanbul/Dubai**: Vultr has no Turkey or Gulf presence. Warsaw (~30ms to Istanbul) and Tel Aviv (~40ms) cover the gap. Hostkey Dubai covers the Gulf. Hostkey also has Istanbul directly.
- **China mainland**: Requires ICP license + Chinese entity. Tokyo and Seoul serve as Phase 1 proxies. Evaluate Alibaba Cloud for Phase 2 if Chinese demand materializes.
- **Canada**: Toronto available on Vultr if needed. Currently served by New Jersey + Silicon Valley.
- **Mexico/Central America**: Mexico City available on Vultr. Currently served by Dallas + São Paulo.
- **Provider consolidation**: Hostkey covers NL, DE, FI, IS, UK, TR, US, UAE, IL — check mini-VPS pricing with account manager. Could reduce Vultr dependency to ~9 nodes (Asia, LATAM, Africa, Oceania, US interior).
---
## Node Configuration Template
Minimal `configuration.nix` for a Clavitor node:
```nix
{ config, pkgs, ... }:
{
# WireGuard — management network
networking.wireguard.interfaces.wg0 = {
ips = [ "10.84.0.NODE_ID/24" ];
privateKeyFile = "/etc/wireguard/private.key";
peers = [{
publicKey = "ZURICH_PUB_KEY";
allowedIPs = [ "10.84.0.0/24" ];
endpoint = "zurich.clavitor.com:51820";
persistentKeepalive = 25;
}];
};
# SSH — WireGuard only
services.openssh = {
enable = true;
listenAddresses = [{ addr = "10.84.0.NODE_ID"; port = 22; }];
settings.PasswordAuthentication = false;
};
# Clavitor
systemd.services.clavitor = {
description = "Clavitor";
after = [ "network-online.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = "/opt/clavitor/clavitor";
Restart = "always";
RestartSec = 5;
EnvironmentFile = "/etc/clavitor/env";
};
};
# Firewall — public: 80+443 only. WireGuard: 51820 from Zurich only.
networking.firewall = {
enable = true;
allowedTCPPorts = [ 80 443 ];
# SSH not in allowedTCPPorts — only reachable via WireGuard
};
}
```
---
*Clavitor — the vault that knows who it's talking to.*