vault1984/docs/NOC-DEPLOYMENT-PLAN.md

10 KiB
Raw Blame History

Vault1984 — Global NOC Deployment Plan

Owner: James
Target: Live Friday March 6, 2026
Budget: ~$6467/mo (20 AWS regions + Hostkey HQ)
HQ: Hans NOC Node — Hostkey Zürich (185.218.204.47, noc.vault1984.com)


Overview

Deploy vault1984 across 20 AWS regions (t4g.nano / ARM Graviton, ~$3/mo each), managed by an OpenClaw NOC agent running on the Hostkey HQ node (Hans, 185.218.204.47). Each AWS node runs NixOS + the vault1984 Go binary. All management traffic flows over WireGuard. Monitoring via Uptime Kuma push heartbeats from each node.

Platform decision: AWS EC2 t4g.nano (ARM/Graviton2). One binary per region. No multi-tenant clustering — each node is fully independent.

Deployment method: TBD — likely Terraform or manual AWS Console for initial rollout. Not yet decided; tooling built to accommodate either approach.


Region Selection (21 nodes total: 20 AWS + 1 Hostkey HQ)

# Name City AWS Region Provider
HQ zurich Zürich, CH Hostkey (Hans, 185.218.204.47)
1 virginia N. Virginia, US us-east-1 AWS t4g.nano
2 ncalifornia N. California, US us-west-1 AWS t4g.nano
3 montreal Montreal, CA ca-central-1 AWS t4g.nano
4 mexicocity Mexico City, MX mx-central-1 AWS t4g.nano
5 saopaulo São Paulo, BR sa-east-1 AWS t4g.nano
6 london London, UK eu-west-2 AWS t4g.nano
7 paris Paris, FR eu-west-3 AWS t4g.nano
8 frankfurt Frankfurt, DE eu-central-1 AWS t4g.nano
9 spain Spain, ES eu-south-2 AWS t4g.nano
10 stockholm Stockholm, SE eu-north-1 AWS t4g.nano
11 uae UAE me-central-1 AWS t4g.nano
12 telaviv Tel Aviv, IL il-central-1 AWS t4g.nano
13 capetown Cape Town, ZA af-south-1 AWS t4g.nano
14 mumbai Mumbai, IN ap-south-1 AWS t4g.nano
15 singapore Singapore, SG ap-southeast-1 AWS t4g.nano
16 jakarta Jakarta, ID ap-southeast-3 AWS t4g.nano
17 malaysia Kuala Lumpur, MY ap-southeast-5 AWS t4g.nano
18 sydney Sydney, AU ap-southeast-2 AWS t4g.nano
19 seoul Seoul, KR ap-northeast-2 AWS t4g.nano
20 hongkong Hong Kong ap-east-1 AWS t4g.nano

Johan-approved on 2026-03-02.


Milestones at a Glance

# Milestone Owner Deadline
M1 Hans HQ ready (WireGuard hub + OC NOC + Kuma) James Mon Mar 2, EOD
M2 NixOS config + deploy tooling in repo James Tue Mar 3, EOD
M3 Pilot: 3 nodes live (Virginia + 2 others) James Wed Mar 4, noon
M4 Go/No-Go review Johan Wed Mar 4, EOD
M5 Full 20-region AWS fleet live James Thu Mar 5, EOD
M6 DNS, TLS, health checks verified James Thu Mar 5, EOD
M7 Go-live: vault1984.com routing to fleet Johan + James Fri Mar 6, noon

Day-by-Day Plan


Sunday Mar 1 — Planning & Prerequisites

  • Read INFRASTRUCTURE.md, write this plan
  • Johan: Set up AWS account + credentials (IAM user or root — access keys needed)
  • Johan: Decide deployment method: Terraform vs manual AWS Console
  • Johan: Approve plan → James starts Monday

Monday Mar 2 — Hans HQ Setup (M1)

M1.1 — WireGuard Hub (on Hans, 185.218.204.47)

  • Generate Hans hub keypair
  • Configure wg0: 10.84.0.1/24, UDP 51820
  • UFW: allow 51820 inbound
  • Save hans.pub to repo

M1.2 — OpenClaw NOC Agent

  • OpenClaw v2026.3.1 installed on Hans
  • Model: Fireworks MiniMax M2.5 (no Anthropic tokens on Hans)
  • Telegram/Discord routing configured for deploy commands

M1.3 — Uptime Kuma fleet monitors

  • New ntfy topic: vault1984-alerts
  • 20 push monitors in Kuma, one per AWS region
  • SEV2: 2 missed pushes; SEV1: 5+ min down
  • All monitors pending (nodes not yet live)

M1.4 — SOC domain

  • soc.vault1984.com → 185.218.204.47 (Cloudflare DNS-only)
  • Kuma accessible at soc.vault1984.com

M1 Done: WireGuard hub up on Hans, NOC agent running, Kuma fleet monitors configured, SOC domain live.


Tuesday Mar 3 — NixOS Config & Tooling (M2)

M2.1 — Repo structure

vault1984/infra/
  nixos/
    base.nix              # shared: WireGuard spoke, SSH, vault1984 service, firewall
    nodes/
      virginia.nix        # per-node vars: wg_ip, hostname, kuma_token, aws_region
      frankfurt.nix
      ... (20 total)
  scripts/
    keygen.sh             # generate WireGuard keypair for a new node
    provision.sh          # provision AWS EC2 + full NixOS config push
    deploy.sh             # push binary + nixos-rebuild [node|all], rolling
    healthcheck.sh        # verify: WG ping, HTTPS 200, Kuma heartbeat received
  wireguard/
    hans.pub              # hub public key (Hans HQ)
    peers.conf            # all node pubkeys + WG IPs (no private keys ever)

Provision approach (Terraform vs AWS Console) is TBD. Scripts above accommodate either — provision.sh takes an already-running EC2 IP and configures it from there.

M2.2 — base.nix

  • WireGuard spoke (parameterized), pointing hub at 185.218.204.47
  • SSH on WireGuard interface only — no public port 22
  • vault1984 systemd service
  • Firewall: public 80+443 only
  • Nix store: 2 generations max, weekly GC

M2.3 — 20 AWS node var files One .nix per node: wg_ip, hostname, aws_region, kuma_push_token, subdomain

M2.4 — vault1984 binary: telemetry push New background goroutine (30s interval):

  • Reads: runtime.MemStats, /proc/loadavg, disk, DB size + integrity check
  • POSTs JSON to KUMA_PUSH_URL env var
  • Fields: ram_mb, disk_pct, cpu_pct, db_size_mb, db_integrity, active_sessions, req_1h, err_1h, cert_days_remaining, nix_gen, uptime_s
  • Build: CGO_ENABLED=1, cross-compiled to linux/arm64 (t4g.nano is Graviton2/ARM)

M2.5 — provision.sh

provision.sh <ip> <node-name>
  1. SSH to fresh EC2 instance (Amazon Linux or NixOS AMI)
  2. Run nixos-infect if needed → wait for reboot (~3 min)
  3. Push base.nix + node vars + WireGuard private key
  4. nixos-rebuild switch
  5. Push vault1984 binary (linux/arm64) + .env
  6. Run healthcheck.sh → confirm WG up, HTTPS 200, Kuma green

M2.6 — deploy.sh

  • Rolling: deploy one node → verify health → next
  • Abort on first failure

M2 Done: Any node provisionable in <20 min. Fleet-wide binary deploy in <10 min.


Wednesday Mar 4 — Pilot: 3 Nodes (M3 + M4)

M3.1 — Virginia as first AWS node (Wed AM)

  • Launch t4g.nano in us-east-1
  • provision.sh → DNS → healthcheck → Kuma green
  • https://virginia.vault1984.com

M3.2 — Frankfurt (Wed AM)

  • t4g.nano in eu-central-1 (~100ms from Hans HQ)
  • provision.sh → DNS → healthcheck → Kuma green

M3.3 — Singapore (Wed AM)

  • t4g.nano in ap-southeast-1
  • provision.sh → DNS → healthcheck → Kuma green

M3.4 — Validation (Wed noon)

  • deploy.sh all rolling update across 3 nodes
  • Kill vault1984 on Frankfurt → Kuma alert fires to ntfy in <2 min → restart → green
  • nmap each node: confirm port 22 not public
  • TLS cert valid on all 3

M4 — Go/No-Go (Wed EOD)

  • Johan reviews 3 pilot nodes
  • Blockers fixed same day
  • Green light → full fleet Thursday

Thursday Mar 5 — Full Fleet (M5 + M6)

M5 — Provision remaining 17 AWS nodes

Batch Regions Time
1 N.California, Montreal, Mexico City, São Paulo Thu 9 AM
2 London, Paris, Spain, Stockholm Thu 11 AM
3 UAE, Tel Aviv, Cape Town, Mumbai Thu 1 PM
4 Jakarta, Malaysia, Sydney, Seoul, Hong Kong Thu 3 PM

Each node: launch t4g.nano in region → provision.sh → DNS → healthcheck → Kuma green

M6 — Fleet verification (Thu EOD)

  • Kuma: all 20 monitors green
  • deploy.sh all — rolling deploy across full fleet
  • Latency check: all nodes reachable from Hans HQ WireGuard
  • No public SSH on any node (nmap spot check)
  • TLS valid on all 20

M5+M6 Done: 20 AWS nodes live, all green, WireGuard mesh stable.


Friday Mar 6 — Go Live (M7)

M7.1 — Final review (Fri AM)

  • Johan spot-checks 3-4 random nodes
  • Kuma dashboard review
  • Any last fixes

M7.2 — vault1984.com routing (Fri noon)

  • Primary: vault1984.com → Virginia (largest US East market)
  • Optional: Cloudflare Load Balancer for GeoDNS ($5/mo — Johan decides)

M7.3 — Go-live

  • Dashboard briefing: fleet live
  • https://soc.vault1984.com status page

🚀 LIVE: Friday March 6, 2026 — noon ET


Prerequisites from Johan

Item Needed By Status
AWS account + credentials (access keys) Mon Mar 2 AM 🔴 Outstanding — blocks everything
AWS deployment method decision (Terraform vs manual) Tue Mar 3 AM 🟡 TBD
Plan approval Sun Mar 1 Approved

No longer needed: Vultr API key — Vultr removed from architecture entirely.

Everything else James handles autonomously once AWS credentials are available.


Risk Register

Risk Mitigation
AWS account setup delay Critical path — affects M3 and all downstream milestones
nixos-infect fails on AWS AMI Fallback: use official NixOS AMI for arm64 directly
Let's Encrypt rate limit 20 certs/week — well under 50 limit; stagger if needed
vault1984 CGO/SQLite on NixOS arm64 Cross-compile with zig; fallback: modernc.org/sqlite (pure Go)
WireGuard NAT on EC2 persistentKeepalive=25; AWS EC2 bare networking, no double-NAT
t4g.nano RAM (0.5GB) vault1984 binary is ~15MB + SQLite; should be fine at low volume

Cost Summary

Component Count Unit Monthly
Hans HQ (Hostkey, Zürich) 1 €3.90/mo ~$4
AWS EC2 t4g.nano 20 ~$3/mo ~$60
Total 21 ~$6467/mo

Budget ceiling: $100/mo → ~$3336/mo reserve for upgrades.


Post-Launch (not blocking Friday)

  • GeoDNS / Cloudflare Load Balancer for latency-based routing
  • Automated weekly NixOS updates via NOC cron
  • China mainland Phase 2 (requires ICP license + separate AWS China account)
  • Terraform for reproducible fleet management (once initial rollout proven)
  • vault1984-web multi-tenant backend with node assignment

Written: 2026-03-01 · Updated: 2026-03-03 · James