284 lines
10 KiB
Markdown
284 lines
10 KiB
Markdown
# Vault1984 — Global NOC Deployment Plan
|
||
|
||
**Owner:** James ⚡
|
||
**Target:** Live Friday March 6, 2026
|
||
**Budget:** ~$64–67/mo (20 AWS regions + Hostkey HQ)
|
||
**HQ:** Hans NOC Node — Hostkey Zürich (185.218.204.47, noc.vault1984.com)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Deploy vault1984 across 20 AWS regions (t4g.nano / ARM Graviton, ~$3/mo each), managed by an OpenClaw NOC agent running on the Hostkey HQ node (Hans, 185.218.204.47). Each AWS node runs NixOS + the vault1984 Go binary. All management traffic flows over WireGuard. Monitoring via Uptime Kuma push heartbeats from each node.
|
||
|
||
**Platform decision:** AWS EC2 t4g.nano (ARM/Graviton2). One binary per region. No multi-tenant clustering — each node is fully independent.
|
||
|
||
**Deployment method:** TBD — likely Terraform or manual AWS Console for initial rollout. Not yet decided; tooling built to accommodate either approach.
|
||
|
||
---
|
||
|
||
## Region Selection (21 nodes total: 20 AWS + 1 Hostkey HQ)
|
||
|
||
| # | Name | City | AWS Region | Provider |
|
||
|---|------|------|------------|----------|
|
||
| HQ | zurich | Zürich, CH | — | Hostkey (Hans, 185.218.204.47) |
|
||
| 1 | virginia | N. Virginia, US | us-east-1 | AWS t4g.nano |
|
||
| 2 | ncalifornia | N. California, US | us-west-1 | AWS t4g.nano |
|
||
| 3 | montreal | Montreal, CA | ca-central-1 | AWS t4g.nano |
|
||
| 4 | mexicocity | Mexico City, MX | mx-central-1 | AWS t4g.nano |
|
||
| 5 | saopaulo | São Paulo, BR | sa-east-1 | AWS t4g.nano |
|
||
| 6 | london | London, UK | eu-west-2 | AWS t4g.nano |
|
||
| 7 | paris | Paris, FR | eu-west-3 | AWS t4g.nano |
|
||
| 8 | frankfurt | Frankfurt, DE | eu-central-1 | AWS t4g.nano |
|
||
| 9 | spain | Spain, ES | eu-south-2 | AWS t4g.nano |
|
||
| 10 | stockholm | Stockholm, SE | eu-north-1 | AWS t4g.nano |
|
||
| 11 | uae | UAE | me-central-1 | AWS t4g.nano |
|
||
| 12 | telaviv | Tel Aviv, IL | il-central-1 | AWS t4g.nano |
|
||
| 13 | capetown | Cape Town, ZA | af-south-1 | AWS t4g.nano |
|
||
| 14 | mumbai | Mumbai, IN | ap-south-1 | AWS t4g.nano |
|
||
| 15 | singapore | Singapore, SG | ap-southeast-1 | AWS t4g.nano |
|
||
| 16 | jakarta | Jakarta, ID | ap-southeast-3 | AWS t4g.nano |
|
||
| 17 | malaysia | Kuala Lumpur, MY | ap-southeast-5 | AWS t4g.nano |
|
||
| 18 | sydney | Sydney, AU | ap-southeast-2 | AWS t4g.nano |
|
||
| 19 | seoul | Seoul, KR | ap-northeast-2 | AWS t4g.nano |
|
||
| 20 | hongkong | Hong Kong | ap-east-1 | AWS t4g.nano |
|
||
|
||
*Johan-approved on 2026-03-02.*
|
||
|
||
---
|
||
|
||
## Milestones at a Glance
|
||
|
||
| # | Milestone | Owner | Deadline |
|
||
|---|-----------|-------|----------|
|
||
| M1 | Hans HQ ready (WireGuard hub + OC NOC + Kuma) | James | Mon Mar 2, EOD |
|
||
| M2 | NixOS config + deploy tooling in repo | James | Tue Mar 3, EOD |
|
||
| M3 | Pilot: 3 nodes live (Virginia + 2 others) | James | Wed Mar 4, noon |
|
||
| M4 | Go/No-Go review | Johan | Wed Mar 4, EOD |
|
||
| M5 | Full 20-region AWS fleet live | James | Thu Mar 5, EOD |
|
||
| M6 | DNS, TLS, health checks verified | James | Thu Mar 5, EOD |
|
||
| M7 | Go-live: vault1984.com routing to fleet | Johan + James | Fri Mar 6, noon |
|
||
|
||
---
|
||
|
||
## Day-by-Day Plan
|
||
|
||
---
|
||
|
||
### Sunday Mar 1 — Planning & Prerequisites
|
||
|
||
- [x] Read INFRASTRUCTURE.md, write this plan
|
||
- [ ] **Johan:** Set up AWS account + credentials (IAM user or root — access keys needed)
|
||
- [ ] **Johan:** Decide deployment method: Terraform vs manual AWS Console
|
||
- [ ] **Johan:** Approve plan → James starts Monday
|
||
|
||
---
|
||
|
||
### Monday Mar 2 — Hans HQ Setup (M1)
|
||
|
||
**M1.1 — WireGuard Hub (on Hans, 185.218.204.47)**
|
||
- Generate Hans hub keypair
|
||
- Configure wg0: `10.84.0.1/24`, UDP 51820
|
||
- UFW: allow 51820 inbound
|
||
- Save `hans.pub` to repo
|
||
|
||
**M1.2 — OpenClaw NOC Agent**
|
||
- ✅ OpenClaw v2026.3.1 installed on Hans
|
||
- Model: Fireworks MiniMax M2.5 (no Anthropic tokens on Hans)
|
||
- Telegram/Discord routing configured for deploy commands
|
||
|
||
**M1.3 — Uptime Kuma fleet monitors**
|
||
- New ntfy topic: `vault1984-alerts`
|
||
- 20 push monitors in Kuma, one per AWS region
|
||
- SEV2: 2 missed pushes; SEV1: 5+ min down
|
||
- All monitors pending (nodes not yet live)
|
||
|
||
**M1.4 — SOC domain**
|
||
- `soc.vault1984.com` → 185.218.204.47 (Cloudflare DNS-only)
|
||
- Kuma accessible at soc.vault1984.com
|
||
|
||
**✅ M1 Done:** WireGuard hub up on Hans, NOC agent running, Kuma fleet monitors configured, SOC domain live.
|
||
|
||
---
|
||
|
||
### Tuesday Mar 3 — NixOS Config & Tooling (M2)
|
||
|
||
**M2.1 — Repo structure**
|
||
```
|
||
vault1984/infra/
|
||
nixos/
|
||
base.nix # shared: WireGuard spoke, SSH, vault1984 service, firewall
|
||
nodes/
|
||
virginia.nix # per-node vars: wg_ip, hostname, kuma_token, aws_region
|
||
frankfurt.nix
|
||
... (20 total)
|
||
scripts/
|
||
keygen.sh # generate WireGuard keypair for a new node
|
||
provision.sh # provision AWS EC2 + full NixOS config push
|
||
deploy.sh # push binary + nixos-rebuild [node|all], rolling
|
||
healthcheck.sh # verify: WG ping, HTTPS 200, Kuma heartbeat received
|
||
wireguard/
|
||
hans.pub # hub public key (Hans HQ)
|
||
peers.conf # all node pubkeys + WG IPs (no private keys ever)
|
||
```
|
||
|
||
> Provision approach (Terraform vs AWS Console) is TBD. Scripts above accommodate either — provision.sh takes an already-running EC2 IP and configures it from there.
|
||
|
||
**M2.2 — base.nix**
|
||
- WireGuard spoke (parameterized), pointing hub at 185.218.204.47
|
||
- SSH on WireGuard interface only — no public port 22
|
||
- vault1984 systemd service
|
||
- Firewall: public 80+443 only
|
||
- Nix store: 2 generations max, weekly GC
|
||
|
||
**M2.3 — 20 AWS node var files**
|
||
One `.nix` per node: wg_ip, hostname, aws_region, kuma_push_token, subdomain
|
||
|
||
**M2.4 — vault1984 binary: telemetry push**
|
||
New background goroutine (30s interval):
|
||
- Reads: `runtime.MemStats`, `/proc/loadavg`, disk, DB size + integrity check
|
||
- POSTs JSON to `KUMA_PUSH_URL` env var
|
||
- Fields: ram_mb, disk_pct, cpu_pct, db_size_mb, db_integrity, active_sessions, req_1h, err_1h, cert_days_remaining, nix_gen, uptime_s
|
||
- Build: `CGO_ENABLED=1`, cross-compiled to `linux/arm64` (t4g.nano is Graviton2/ARM)
|
||
|
||
**M2.5 — provision.sh**
|
||
```
|
||
provision.sh <ip> <node-name>
|
||
```
|
||
1. SSH to fresh EC2 instance (Amazon Linux or NixOS AMI)
|
||
2. Run `nixos-infect` if needed → wait for reboot (~3 min)
|
||
3. Push base.nix + node vars + WireGuard private key
|
||
4. `nixos-rebuild switch`
|
||
5. Push vault1984 binary (`linux/arm64`) + .env
|
||
6. Run healthcheck.sh → confirm WG up, HTTPS 200, Kuma green
|
||
|
||
**M2.6 — deploy.sh**
|
||
- Rolling: deploy one node → verify health → next
|
||
- Abort on first failure
|
||
|
||
**✅ M2 Done:** Any node provisionable in <20 min. Fleet-wide binary deploy in <10 min.
|
||
|
||
---
|
||
|
||
### Wednesday Mar 4 — Pilot: 3 Nodes (M3 + M4)
|
||
|
||
**M3.1 — Virginia as first AWS node (Wed AM)**
|
||
- Launch t4g.nano in us-east-1
|
||
- `provision.sh` → DNS → healthcheck → Kuma green
|
||
- `https://virginia.vault1984.com`
|
||
|
||
**M3.2 — Frankfurt (Wed AM)**
|
||
- t4g.nano in eu-central-1 (~100ms from Hans HQ)
|
||
- `provision.sh` → DNS → healthcheck → Kuma green
|
||
|
||
**M3.3 — Singapore (Wed AM)**
|
||
- t4g.nano in ap-southeast-1
|
||
- `provision.sh` → DNS → healthcheck → Kuma green
|
||
|
||
**M3.4 — Validation (Wed noon)**
|
||
- `deploy.sh all` rolling update across 3 nodes
|
||
- Kill vault1984 on Frankfurt → Kuma alert fires to ntfy in <2 min → restart → green
|
||
- `nmap` each node: confirm port 22 not public
|
||
- TLS cert valid on all 3
|
||
|
||
**M4 — Go/No-Go (Wed EOD)**
|
||
- Johan reviews 3 pilot nodes
|
||
- Blockers fixed same day
|
||
- Green light → full fleet Thursday
|
||
|
||
---
|
||
|
||
### Thursday Mar 5 — Full Fleet (M5 + M6)
|
||
|
||
**M5 — Provision remaining 17 AWS nodes**
|
||
|
||
| Batch | Regions | Time |
|
||
|-------|---------|------|
|
||
| 1 | N.California, Montreal, Mexico City, São Paulo | Thu 9 AM |
|
||
| 2 | London, Paris, Spain, Stockholm | Thu 11 AM |
|
||
| 3 | UAE, Tel Aviv, Cape Town, Mumbai | Thu 1 PM |
|
||
| 4 | Jakarta, Malaysia, Sydney, Seoul, Hong Kong | Thu 3 PM |
|
||
|
||
Each node: launch t4g.nano in region → `provision.sh` → DNS → healthcheck → Kuma green
|
||
|
||
**M6 — Fleet verification (Thu EOD)**
|
||
- Kuma: all 20 monitors green
|
||
- `deploy.sh all` — rolling deploy across full fleet
|
||
- Latency check: all nodes reachable from Hans HQ WireGuard
|
||
- No public SSH on any node (nmap spot check)
|
||
- TLS valid on all 20
|
||
|
||
**✅ M5+M6 Done:** 20 AWS nodes live, all green, WireGuard mesh stable.
|
||
|
||
---
|
||
|
||
### Friday Mar 6 — Go Live (M7)
|
||
|
||
**M7.1 — Final review (Fri AM)**
|
||
- Johan spot-checks 3-4 random nodes
|
||
- Kuma dashboard review
|
||
- Any last fixes
|
||
|
||
**M7.2 — vault1984.com routing (Fri noon)**
|
||
- Primary: `vault1984.com` → Virginia (largest US East market)
|
||
- Optional: Cloudflare Load Balancer for GeoDNS ($5/mo — Johan decides)
|
||
|
||
**M7.3 — Go-live**
|
||
- Dashboard briefing: fleet live
|
||
- `https://soc.vault1984.com` status page
|
||
|
||
**🚀 LIVE: Friday March 6, 2026 — noon ET**
|
||
|
||
---
|
||
|
||
## Prerequisites from Johan
|
||
|
||
| Item | Needed By | Status |
|
||
|------|-----------|--------|
|
||
| AWS account + credentials (access keys) | Mon Mar 2 AM | 🔴 Outstanding — blocks everything |
|
||
| AWS deployment method decision (Terraform vs manual) | Tue Mar 3 AM | 🟡 TBD |
|
||
| Plan approval | Sun Mar 1 | ✅ Approved |
|
||
|
||
> **No longer needed:** ~~Vultr API key~~ — Vultr removed from architecture entirely.
|
||
|
||
Everything else James handles autonomously once AWS credentials are available.
|
||
|
||
---
|
||
|
||
## Risk Register
|
||
|
||
| Risk | Mitigation |
|
||
|------|-----------|
|
||
| AWS account setup delay | Critical path — affects M3 and all downstream milestones |
|
||
| nixos-infect fails on AWS AMI | Fallback: use official NixOS AMI for arm64 directly |
|
||
| Let's Encrypt rate limit | 20 certs/week — well under 50 limit; stagger if needed |
|
||
| vault1984 CGO/SQLite on NixOS arm64 | Cross-compile with zig; fallback: modernc.org/sqlite (pure Go) |
|
||
| WireGuard NAT on EC2 | persistentKeepalive=25; AWS EC2 bare networking, no double-NAT |
|
||
| t4g.nano RAM (0.5GB) | vault1984 binary is ~15MB + SQLite; should be fine at low volume |
|
||
|
||
---
|
||
|
||
## Cost Summary
|
||
|
||
| Component | Count | Unit | Monthly |
|
||
|-----------|-------|------|---------|
|
||
| Hans HQ (Hostkey, Zürich) | 1 | €3.90/mo | ~$4 |
|
||
| AWS EC2 t4g.nano | 20 | ~$3/mo | ~$60 |
|
||
| **Total** | **21** | | **~$64–67/mo** |
|
||
|
||
Budget ceiling: $100/mo → **~$33–36/mo reserve** for upgrades.
|
||
|
||
---
|
||
|
||
## Post-Launch (not blocking Friday)
|
||
|
||
- GeoDNS / Cloudflare Load Balancer for latency-based routing
|
||
- Automated weekly NixOS updates via NOC cron
|
||
- China mainland Phase 2 (requires ICP license + separate AWS China account)
|
||
- Terraform for reproducible fleet management (once initial rollout proven)
|
||
- vault1984-web multi-tenant backend with node assignment
|
||
|
||
---
|
||
|
||
*Written: 2026-03-01 · Updated: 2026-03-03 · James ⚡*
|