3.1 KiB

Raw Blame History

Azure Files Backup — Requirements Spec

Captured: 2025-01-28 | Domain: Personal | Priority: HIGH

Purpose

POC to prove a point: The right architecture can backup billions of files with minimal database overhead.

This is NOT a Kaseya project — it's Johan demonstrating his design philosophy.

Target

Azure Files API specifically
NOT Azure Blob Storage
NOT OneDrive/SharePoint

Scale Requirements

Billions of files
64-bit node IDs required
DB must fit in RAM for fast queries (~50GB target)

Database Design (~50 bytes/file)

Field	Type	Size	Purpose
node_id	int64	8 bytes	Unique identifier (billions need 64-bit)
parent_id	int64	8 bytes	Tree structure link
name	varchar	~20 bytes	Filename only, NOT full path
size	int64	8 bytes	File size in bytes
mtime	int64	8 bytes	Unix timestamp
hash	int64	8 bytes	xorhash (MSFT standard)

Total: ~50 bytes/file → ~50GB for 1 billion files → fits in RAM

Key Constraints

Node tree only — NO full path strings stored
Paths reconstructed by walking parent_id to root
Rename directory = update 1 row, not millions
DB is index + analytics only

Object Storage Design

Everything that doesn't fit in 50 bytes goes here:

Full metadata (ACLs, extended attributes, permissions)
File content (chunked, deduplicated)
Version history
FlatBuffer serialized

Bundling

TAR format (proven, standard)
Only when it saves ops (not for just 2 files)
Threshold TBD (likely <64KB or <1MB)

Hash Strategy

xorhash — MSFT standard, 64-bit, fast
NOT sha256 (overkill for change detection)
Used for: change detection, not cryptographic verification

Architecture

~/dev/azure-backup/
├── core/    — library (tree, hash, storage interface, flatbuffer)
├── worker/  — K8s-scalable backup worker (100s of workers)
├── api/     — REST API for GUI
└── web/     — Go templates + htmx

Worker Design

Stateless K8s pods
Horizontal scaling (add pods, auto-claim work)
Job types: scan, backup, restore, verify
Queue: Postgres SKIP LOCKED (works up to ~1000 workers)

Multi-Tenant

Isolated by tenant_id + share_id
Each tenant+share gets separate node tree
Object paths: {tenant_id}/{share_id}/{node_id}

GUI Requirements

Web UI: Go + htmx/templ
Multi-tenant view (not single-tenant)

Open Questions (resolved)

✅ 64-bit node IDs (billions of files)
✅ xorhash not sha256
✅ TAR bundling
✅ Multi-tenant GUI
✅ Proprietary license

Status

✅ Requirements captured
✅ Repo scaffolded
✅ ARCHITECTURE.md written
✅ FlatBuffer schema + Go code generated
✅ Azure SDK integration (real client implementation)
✅ Web UI (Go + htmx + Tailwind)
✅ 4,400+ lines of Go code
🔲 Azure free trial account (needs Johan)
🔲 Database integration (Postgres)
🔲 End-to-end test with real Azure Files

3.1 KiB Raw Blame History