clawd/hans/INFRASTRUCTURE-OVERVIEW.md

188 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# vault1984 — Infrastructure Overview
*Last updated: 2026-03-03 · James ⚡*
*Go-live target: Friday March 6, 2026 — noon ET*
---
## 1. Hub — Zurich SOC (82.22.36.202)
| Field | Value |
|-------|-------|
| **Provider** | Hostkey (Switzerland, likely Equinix ZH) |
| **IP** | 82.22.36.202 |
| **DNS** | zurich.inou.com |
| **Specs** | 4 vCPU / 6 GB RAM / 120 GB SSD |
| **Cost** | Existing (already paid — inou.com infrastructure) |
| **WireGuard role** | Hub — 10.84.0.1/24, UDP 51820 |
### Services Running on Hub
| Service | Port / Address | Purpose |
|---------|---------------|---------|
| **WireGuard hub** | UDP 51820 / 10.84.0.1 | Fleet management network |
| **Caddy** | 443 (public) | Reverse proxy + auto-TLS |
| **Stalwart mail** | 25/465/587/143/993/995 | @jongsma.me, @inou.com, @vault1984.com |
| **Uptime Kuma** | localhost:3001 → `soc.vault1984.com` | Fleet monitoring dashboard |
| **ntfy** | localhost:2586 → `ntfy.inou.com` | Push alerts (`vault1984-alerts`) |
| **Git server** | SSH (git user) | vault1984.git, vault1984-web.git, others |
> **Note:** SSH on the hub is public (normal sshd). Spoke nodes have SSH on WireGuard only — port 22 is NOT reachable from the public internet.
---
## 2. Spoke Nodes — 16-Node Global Fleet
### Vultr Plan: VX1 ✅ Confirmed
**$2.50/mo** — 1 vCPU, 512 MB RAM, 10 GB SSD, 500 GB transfer
*(Source: INFRASTRUCTURE.md — "All Vultr nodes: VX1 tier — 1 vCPU, 512 MB RAM, 10 GB SSD, 0.5 TB bandwidth @ $2.50/mo")*
### Full Node Table
| # | Node Name | City | Provider | Plan | WG IP | Cost/mo | Status |
|---|-----------|------|----------|------|-------|---------|--------|
| 1 | `zurich` | Zürich, CH | Hostkey (existing) | 4vCPU/6GB/120GB | 10.84.0.2 | $0 (existing) | ⏸️ Spoke not yet deployed |
| 2 | `frankfurt` | Frankfurt, DE | Vultr | VX1 $2.50 | 10.84.0.3 | $2.50 | ❌ Not provisioned |
| 3 | `newjersey` | New Jersey, US | Vultr | VX1 $2.50 | 10.84.0.4 | $2.50 | ❌ Not provisioned |
| 4 | `siliconvalley` | Silicon Valley, US | Vultr | VX1 $2.50 | 10.84.0.5 | $2.50 | ❌ Not provisioned |
| 5 | `dallas` | Dallas, US | Vultr | VX1 $2.50 | 10.84.0.6 | $2.50 | ❌ Not provisioned |
| 6 | `london` | London, UK | Vultr | VX1 $2.50 | 10.84.0.7 | $2.50 | ❌ Not provisioned |
| 7 | `warsaw` | Warsaw, PL | Vultr | VX1 $2.50 | 10.84.0.8 | $2.50 | ❌ Not provisioned |
| 8 | `tokyo` | Tokyo, JP | Vultr | VX1 $2.50 | 10.84.0.9 | $2.50 | ❌ Not provisioned |
| 9 | `seoul` | Seoul, KR | Vultr | VX1 $2.50 | 10.84.0.10 | $2.50 | ❌ Not provisioned |
| 10 | `mumbai` | Mumbai, IN | Vultr | VX1 $2.50 | 10.84.0.11 | $2.50 | ❌ Not provisioned |
| 11 | `saopaulo` | São Paulo, BR | Vultr | VX1 $2.50 | 10.84.0.12 | $2.50 | ❌ Not provisioned |
| 12 | `sydney` | Sydney, AU | Vultr | VX1 $2.50 | 10.84.0.13 | $2.50 | ❌ Not provisioned |
| 13 | `johannesburg` | Johannesburg, ZA | Vultr | VX1 $2.50 | 10.84.0.14 | $2.50 | ❌ Not provisioned |
| 14 | `telaviv` | Tel Aviv, IL | Vultr | VX1 $2.50 | 10.84.0.15 | $2.50 | ❌ Not provisioned |
| 15 | `dubai` | Dubai, AE | Hostkey | ~$58/mo (vm.mini class) | 10.84.0.16 | ~$6.50 | ⏸️ Decision pending |
| 16 | `istanbul` | Istanbul, TR | TBD (Hostkey preferred; Vultr has no TR) | TBD | 10.84.0.17 | ~$3.90 est. | ⏸️ Provider TBD |
> **Istanbul note:** Vultr has no Turkey presence. Hostkey does. Likely Hostkey vm.mini at ~€3.90/mo. Warsaw covers Istanbul at ~30ms if deferred.
> **Dubai note:** INFRASTRUCTURE.md lists Dubai as Hostkey at ~$58/mo. Order not yet placed — pending Johan's decision.
---
## 3. What Runs on Each Spoke
Every spoke node runs the same minimal stack — deliberately so. No drift by design.
```
[Vultr/Hostkey VPS]
├── NixOS (declarative, reproducible, 2 generations max)
├── vault1984 binary (Go, ~15 MB, ports :80 + :443)
│ ├── Built-in autocert (Let's Encrypt via golang.org/x/crypto/acme/autocert)
│ ├── Kuma push heartbeat (every 30s to soc.vault1984.com)
│ └── vault1984.db (SQLite + WAL)
└── WireGuard spoke → hub (10.84.0.1:51820)
└── SSH binds to WireGuard IP only (10.84.0.x:22)
```
**Public ports:** 80, 443 only.
**NOT public:** Port 22 (SSH reachable only via WireGuard tunnel from Zurich hub).
### Heartbeat Payload (every 30s, vault1984 → Kuma)
```json
{
"node": "tokyo",
"ram_mb": 142, "disk_pct": 31.2, "cpu_pct": 2.1,
"db_size_mb": 12, "db_integrity": true,
"active_sessions": 3, "req_1h": 847, "err_1h": 2,
"cert_days_remaining": 62, "nix_gen": 2, "uptime_s": 864000
}
```
**Key watchdog metric:** `cert_days_remaining` — visible in Kuma before any cert expires.
---
## 4. DNS Plan
### Per-Node Subdomains
Each node gets its own subdomain under `vault1984.com`:
| Node | FQDN | Type | Points to |
|------|------|------|-----------|
| zurich | zurich.vault1984.com | A | 82.22.36.202 |
| frankfurt | frankfurt.vault1984.com | A | (Vultr IP, TBD) |
| newjersey | newjersey.vault1984.com | A | (Vultr IP, TBD) |
| … | … | A | (Vultr IP, TBD) |
| dubai | dubai.vault1984.com | A | (Hostkey IP, TBD) |
All DNS via **Cloudflare** (zone: `1c7614cd4ee5eabdc03905609024f93a`).
**DNS-only mode** — no Cloudflare proxying. vault1984 is a password vault; routing through third-party proxies defeats the trust model.
### vault1984.com Root
- **vault1984.com** → **New Jersey** node (primary; largest US East market)
- `www.vault1984.com` → same (or 301 → apex)
- **Option: Cloudflare Load Balancer GeoDNS** → $5/mo — latency-based routing across all nodes. Johan decides post-pilot.
### SOC Domain
- `soc.vault1984.com` → 82.22.36.202 (Caddy → Kuma:3001) — internal status dashboard
---
## 5. Current Status vs Plan
| # | Milestone | Deadline | Status | Notes |
|---|-----------|----------|--------|-------|
| **M1** | Zurich SOC ready (WireGuard hub + Kuma + `soc.vault1984.com`) | Mon Mar 2, EOD | 🔄 In progress | WireGuard hub + Kuma configured on Zurich; fleet Kuma monitors need creation when nodes go live. Hans server (185.218.204.47) live as NOC node. |
| **M2** | NixOS config + deploy tooling in `vault1984/infra/` | Tue Mar 3, EOD | 🔄 In progress | **TODAY** — Hans executing. Includes base.nix, 16 node vars, provision.sh, deploy.sh, healthcheck.sh, vault1984 telemetry push goroutine. |
| **M3** | Pilot: 3 nodes live (Zurich, Frankfurt, NJ) | Wed Mar 4, noon | ❌ Not started | Blocked on M2 completion + Vultr API key. |
| **M4** | Go/No-Go review | Wed Mar 4, EOD | ❌ Not started | Johan reviews pilot. |
| **M5** | Full 16-node fleet live | Thu Mar 5, EOD | ❌ Not started | 4 batches of ~4 nodes. Blocked on M4 green light + Vultr API key. |
| **M6** | DNS, TLS, health checks verified across all 16 | Thu Mar 5, EOD | ❌ Not started | Follows M5. |
| **M7** | 🚀 Go-live — vault1984.com routes to fleet | **Fri Mar 6, noon** | ❌ Not started | Johan + James final sign-off. |
---
## 6. Cost Breakdown
### Monthly Infrastructure Cost
| Component | Nodes | Unit Cost | Monthly |
|-----------|-------|-----------|---------|
| Zurich hub (Hostkey) | 1 | Existing (inou.com infra) | $0 incremental |
| Vultr VX1 nodes | 13 | $2.50/mo | **$32.50** |
| Dubai (Hostkey, ~vm.mini) | 1 | ~$58/mo est. | **~$6.50** |
| Istanbul (Hostkey est.) | 1 | ~€3.90/mo est. | **~$4.25** |
| **Total fleet** | **16** | — | **~$43/mo** |
> Zurich hub cost is shared with inou.com, Stalwart mail, and other services — not charged to vault1984 budget.
### Remaining Budget
- Budget ceiling: **$100/mo**
- Fleet spend: **~$43/mo**
- Reserve for upgrades: **~$57/mo** (use when individual nodes see demand)
### Node Upgrade Path (when needed)
| Tier | Specs | Cost |
|------|-------|------|
| VX1 (current) | 1 vCPU / 512MB / 10GB | $2.50/mo |
| Next tier | 1 vCPU / 1GB / 25GB / 1TB | $6/mo |
| Mid tier | 2 vCPU / 2GB / 50GB / 2TB | $12/mo |
---
## 7. Blockers
| Blocker | Owner | Impact | Notes |
|---------|-------|--------|-------|
| **Vultr API key** | 🔴 Johan (pending) | Blocks M3, M5 — cannot provision any VPS | Was due Mon Mar 2 AM. Still outstanding as of Tue Mar 3. Hans cannot provision 13 nodes without it. |
| **Dubai decision** | 🟡 Johan | Blocks Dubai node (15th spoke) | Option A: Order Hostkey Dubai (~$58/mo). Option B: Cover Gulf region with Tel Aviv (~40ms). Option C: Defer to post-launch. Warsaw covers Istanbul at 30ms if Istanbul also deferred. |
| **Istanbul provider** | 🟡 James/Hans | Blocks 16th spoke | Vultr has no Turkey presence. Hostkey does. Likely Hostkey vm.mini ~€3.90/mo. Low urgency — Warsaw covers at ~30ms. |
---
## Architecture Principles (for reference)
1. **No Caddy on spokes.** vault1984 binary handles TLS itself via `autocert` — eliminates a process and potential cert misconfig. Learned from Kaseya cert incidents.
2. **No Cloudflare proxying.** DNS-only. Password vault + third-party MITM = trust model broken.
3. **No public SSH.** Every spoke node: SSH on WireGuard interface only. Public internet sees 80+443, nothing else.
4. **NixOS everywhere.** Declarative = zero drift. One config file per node, checked into repo. Roll back any node in seconds.
5. **Nodes are independent.** No replication. User vault lives on one node. Scale up single nodes when demand warrants.
---
*vault1984 — "1984 had no secrets. You should."*