clawd/hans/INFRASTRUCTURE-OVERVIEW.md

198 lines
9.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# vault1984 — Infrastructure Overview
*Last updated: 2026-03-03 · James ⚡*
*Go-live target: Friday March 6, 2026 — noon ET*
---
## 1. HQ — Hans NOC Node (185.218.204.47)
| Field | Value |
|-------|-------|
| **Provider** | Hostkey (Switzerland, Zürich) |
| **IP** | 185.218.204.47 |
| **DNS** | noc.vault1984.com |
| **Specs** | 4 vCPU / 6 GB RAM / 120 GB SSD (vm.mini) |
| **Cost** | €3.90/mo |
| **WireGuard role** | Hub — 10.84.0.1/24, UDP 51820 |
This is the **vault1984 control plane and NOC node** — dedicated to vault1984 infrastructure only. It is NOT an AWS instance.
### Services Running on HQ
| Service | Port / Address | Purpose |
|---------|---------------|---------|
| **WireGuard hub** | UDP 51820 / 10.84.0.1 | Fleet management network |
| **OpenClaw NOC agent** | internal | Receives deploy commands, executes, reports back |
| **Uptime Kuma** | localhost:3001 → `soc.vault1984.com` | Fleet monitoring dashboard |
| **ntfy** | push alerts | `vault1984-alerts` topic |
> **Note:** Stalwart mail, Git server, and inou.com infrastructure run on the separate Zurich inou.com server (82.22.36.202). Hans is vault1984-only.
> **SSH on spoke nodes:** Spoke nodes have SSH on WireGuard only — port 22 is NOT reachable from the public internet.
---
## 2. Spoke Nodes — 21-Region Global Fleet
### Platform: AWS EC2 t4g.nano ✅ Approved
**~$3/mo** — ARM/Graviton, 2 vCPU, 0.5 GB RAM
One binary per region. No database sync, no replication — each node is independent.
> **Deployment method:** TBD — likely Terraform or manual AWS Console for initial rollout. Not yet decided; do not assume automation tooling exists.
### Full Node Table
| # | Node Name | Region / City | AWS Region | Provider | WG IP | Cost/mo | Status |
|---|-----------|---------------|------------|----------|-------|---------|--------|
| HQ | `zurich` | Zürich, CH | — (Hostkey) | Hostkey Hans | 10.84.0.1 | €3.90 | 🔄 NOC live, spoke TBD |
| 1 | `virginia` | N. Virginia, US | us-east-1 | AWS t4g.nano | 10.84.0.2 | ~$3 | ❌ Not provisioned |
| 2 | `ncalifornia` | N. California, US | us-west-1 | AWS t4g.nano | 10.84.0.3 | ~$3 | ❌ Not provisioned |
| 3 | `montreal` | Montreal, CA | ca-central-1 | AWS t4g.nano | 10.84.0.4 | ~$3 | ❌ Not provisioned |
| 4 | `mexicocity` | Mexico City, MX | mx-central-1 | AWS t4g.nano | 10.84.0.5 | ~$3 | ❌ Not provisioned |
| 5 | `saopaulo` | São Paulo, BR | sa-east-1 | AWS t4g.nano | 10.84.0.6 | ~$3 | ❌ Not provisioned |
| 6 | `london` | London, UK | eu-west-2 | AWS t4g.nano | 10.84.0.7 | ~$3 | ❌ Not provisioned |
| 7 | `paris` | Paris, FR | eu-west-3 | AWS t4g.nano | 10.84.0.8 | ~$3 | ❌ Not provisioned |
| 8 | `frankfurt` | Frankfurt, DE | eu-central-1 | AWS t4g.nano | 10.84.0.9 | ~$3 | ❌ Not provisioned |
| 9 | `spain` | Spain, ES | eu-south-2 | AWS t4g.nano | 10.84.0.10 | ~$3 | ❌ Not provisioned |
| 10 | `stockholm` | Stockholm, SE | eu-north-1 | AWS t4g.nano | 10.84.0.11 | ~$3 | ❌ Not provisioned |
| 11 | `uae` | UAE | me-central-1 | AWS t4g.nano | 10.84.0.12 | ~$3 | ❌ Not provisioned |
| 12 | `telaviv` | Tel Aviv, IL | il-central-1 | AWS t4g.nano | 10.84.0.13 | ~$3 | ❌ Not provisioned |
| 13 | `capetown` | Cape Town, ZA | af-south-1 | AWS t4g.nano | 10.84.0.14 | ~$3 | ❌ Not provisioned |
| 14 | `mumbai` | Mumbai, IN | ap-south-1 | AWS t4g.nano | 10.84.0.15 | ~$3 | ❌ Not provisioned |
| 15 | `singapore` | Singapore, SG | ap-southeast-1 | AWS t4g.nano | 10.84.0.16 | ~$3 | ❌ Not provisioned |
| 16 | `jakarta` | Jakarta, ID | ap-southeast-3 | AWS t4g.nano | 10.84.0.17 | ~$3 | ❌ Not provisioned |
| 17 | `malaysia` | Kuala Lumpur, MY | ap-southeast-5 | AWS t4g.nano | 10.84.0.18 | ~$3 | ❌ Not provisioned |
| 18 | `sydney` | Sydney, AU | ap-southeast-2 | AWS t4g.nano | 10.84.0.19 | ~$3 | ❌ Not provisioned |
| 19 | `seoul` | Seoul, KR | ap-northeast-2 | AWS t4g.nano | 10.84.0.20 | ~$3 | ❌ Not provisioned |
| 20 | `hongkong` | Hong Kong | ap-east-1 | AWS t4g.nano | 10.84.0.21 | ~$3 | ❌ Not provisioned |
> **Why Graviton/ARM?** AWS t4g.nano is ARM-based (Graviton2). Unique in the market at this price point — GCP doesn't offer ARM below t2a-standard-1 (1 vCPU, 4 GB RAM). vault1984 Go binary cross-compiles to `linux/arm64` cleanly.
---
## 3. What Runs on Each Spoke
Every spoke node runs the same minimal stack — deliberately so. No drift by design.
```
[AWS EC2 t4g.nano]
├── NixOS (declarative, reproducible, 2 generations max)
├── vault1984 binary (Go, ~15 MB, ports :80 + :443)
│ ├── Built-in autocert (Let's Encrypt via golang.org/x/crypto/acme/autocert)
│ ├── Kuma push heartbeat (every 30s to soc.vault1984.com)
│ └── vault1984.db (SQLite + WAL)
└── WireGuard spoke → hub (10.84.0.1:51820, Hans HQ)
└── SSH binds to WireGuard IP only (10.84.0.x:22)
```
**Public ports:** 80, 443 only.
**NOT public:** Port 22 (SSH reachable only via WireGuard tunnel from Hans HQ).
### Heartbeat Payload (every 30s, vault1984 → Kuma)
```json
{
"node": "singapore",
"ram_mb": 142, "disk_pct": 31.2, "cpu_pct": 2.1,
"db_size_mb": 12, "db_integrity": true,
"active_sessions": 3, "req_1h": 847, "err_1h": 2,
"cert_days_remaining": 62, "nix_gen": 2, "uptime_s": 864000
}
```
**Key watchdog metric:** `cert_days_remaining` — visible in Kuma before any cert expires.
---
## 4. DNS Plan
### Per-Node Subdomains
Each node gets its own subdomain under `vault1984.com`:
| Node | FQDN | Type | Points to |
|------|------|------|-----------|
| zurich (HQ) | noc.vault1984.com | A | 185.218.204.47 |
| virginia | virginia.vault1984.com | A | (AWS IP, TBD) |
| ncalifornia | ncalifornia.vault1984.com | A | (AWS IP, TBD) |
| montreal | montreal.vault1984.com | A | (AWS IP, TBD) |
| … | … | A | (AWS IP, TBD) |
All DNS via **Cloudflare** (zone: `1c7614cd4ee5eabdc03905609024f93a`).
**DNS-only mode** — no Cloudflare proxying. vault1984 is a password vault; routing through third-party proxies defeats the trust model.
### vault1984.com Root
- **vault1984.com** → **Virginia** node (primary; largest US East market)
- `www.vault1984.com` → same (or 301 → apex)
- **Option: Cloudflare Load Balancer GeoDNS** → $5/mo — latency-based routing across all nodes. Johan decides post-pilot.
### SOC Domain
- `soc.vault1984.com` → 185.218.204.47 (Hans HQ → Kuma:3001) — internal status dashboard
---
## 5. Current Status vs Plan
| # | Milestone | Deadline | Status | Notes |
|---|-----------|----------|--------|-------|
| **M1** | Hans HQ ready (WireGuard hub + OC NOC + `soc.vault1984.com`) | Mon Mar 2, EOD | 🔄 In progress | OpenClaw NOC live on Hans. WireGuard hub + Kuma fleet monitors need creation when nodes go live. |
| **M2** | NixOS config + deploy tooling in `vault1984/infra/` | Tue Mar 3, EOD | 🔄 In progress | **TODAY** — Hans executing. Includes base.nix, node vars, provision scripts, vault1984 telemetry push goroutine. Deployment method (Terraform vs manual AWS Console) TBD. |
| **M3** | Pilot: 3 nodes live (Virginia + 2 others) | Wed Mar 4, noon | ❌ Not started | Blocked on M2 completion + AWS account/credentials setup. |
| **M4** | Go/No-Go review | Wed Mar 4, EOD | ❌ Not started | Johan reviews pilot. |
| **M5** | Full 20-region AWS fleet live | Thu Mar 5, EOD | ❌ Not started | 4 batches. Blocked on M4 green light + AWS account/credentials. |
| **M6** | DNS, TLS, health checks verified across all nodes | Thu Mar 5, EOD | ❌ Not started | Follows M5. |
| **M7** | 🚀 Go-live — vault1984.com routes to fleet | **Fri Mar 6, noon** | ❌ Not started | Johan + James final sign-off. |
---
## 6. Cost Breakdown
### Monthly Infrastructure Cost
| Component | Nodes | Unit Cost | Monthly |
|-----------|-------|-----------|---------|
| Hans HQ (Hostkey Zürich) | 1 | €3.90/mo | **~$4** |
| AWS t4g.nano (20 regions) | 20 | ~$3/mo | **~$60** |
| **Total fleet** | **21** | — | **~$64/mo** |
> Approximate total: **~$6467/mo** (EUR/USD fluctuation). Well under the $100/mo budget.
> The inou.com Zurich server (82.22.36.202) is separate infrastructure — not charged to vault1984 budget.
### Remaining Budget
- Budget ceiling: **$100/mo**
- Fleet spend: **~$6467/mo**
- Reserve for upgrades: **~$3336/mo** (use when individual nodes see demand)
### Node Upgrade Path (when needed)
| Tier | Specs | Cost |
|------|-------|------|
| t4g.nano (current) | 2 vCPU / 0.5 GB / ARM | ~$3/mo |
| t4g.micro | 2 vCPU / 1 GB / ARM | ~$6/mo |
| t4g.small | 2 vCPU / 2 GB / ARM | ~$12/mo |
---
## 7. Blockers
| Blocker | Owner | Impact | Notes |
|---------|-------|--------|-------|
| **AWS account / credentials setup** | 🔴 Johan (pending) | Blocks M3, M5 — cannot provision any EC2 instances | No AWS account configured yet. Needed before any spoke can be provisioned. |
| **AWS deployment method** | 🟡 James/Hans | Blocks M2 tooling finalization | Likely Terraform or manual AWS Console. Not yet decided — do not build automation assuming either approach. |
> **No longer a blocker:** ~~Vultr API key~~ — Vultr removed from architecture entirely.
---
## Architecture Principles (for reference)
1. **No Caddy on spokes.** vault1984 binary handles TLS itself via `autocert` — eliminates a process and potential cert misconfig. Learned from Kaseya cert incidents.
2. **No Cloudflare proxying.** DNS-only. Password vault + third-party MITM = trust model broken.
3. **No public SSH.** Every spoke node: SSH on WireGuard interface only. Public internet sees 80+443, nothing else.
4. **NixOS everywhere.** Declarative = zero drift. One config file per node, checked into repo. Roll back any node in seconds.
5. **Nodes are independent.** No replication. User vault lives on one node. Scale up single nodes when demand warrants.
6. **ARM/Graviton only.** AWS t4g.nano — cheapest viable ARM compute in the market. vault1984 Go binary compiles to `linux/arm64` cleanly.
---
*vault1984 — "1984 had no secrets. You should."*