6.7 KiB
Azure Files Backup POC - Progress Notes
Date: 2025-01-29 Repo: ~/dev/azure-backup
Summary
Continued development of the Azure Files Backup POC. Implemented Azure Files API client, tree scanner, FlatBuffer schema, and tests.
Completed Tasks
1. ✅ Azure Files API Client (pkg/storage/azure.go)
Created a comprehensive Azure Files client with:
-
AzureFilesClientinterface - Main abstraction for Azure Files operations:ListDirectory()- List files/directoriesGetFile()- Download file contentGetFileProperties()- Get file metadata without contentGetFileRange()- Download specific byte rangeWalkDirectory()- Recursive directory traversalClose()- Cleanup
-
AzureConfig- Configuration with validation:- Supports Shared Key auth (
AccountName+AccountKey) - Supports SAS token auth
- Supports connection string
- Supports Shared Key auth (
-
MockAzureFilesClient- Full mock implementation for testing:AddFile(),AddDirectory(),SetFileContent(),SetError()- All interface methods implemented with mock behavior
- Supports
SkipDirfor walk control
2. ✅ FlatBuffer Schema (schemas/metadata.fbs)
Designed comprehensive FlatBuffer schema with:
-
FileMetadata- Full file metadata (stored in object storage):- Identity: node_id, parent_id, full_path, name
- Timestamps: mtime, ctime, atime
- POSIX: mode, owner, group
- Extended: ACL entries, xattrs
- Content: xorhash, content_hash, chunk references
- Versions: version history references
- Azure-specific: etag, content_md5, attributes
-
DirectoryMetadata- Directory-specific metadata with child counts -
ScanResult- Scan operation results -
BackupManifest- Backup snapshot metadata
3. ✅ Tree Scanner (pkg/tree/scanner.go)
Implemented the scanner that syncs Azure state to local DB:
-
Scanner- Main scanner type:- Takes
AzureFilesClient,NodeStore, andLogger - Supports full and incremental scans
- Batch processing for efficiency
- Progress callbacks for monitoring
- Takes
-
ScanOptions- Configurable scan behavior:RootPath,RootNodeID,FullScanBatchSize,ConcurrencyProgressCallback,ProgressInterval
-
ScanStats- Atomic counters for metrics:- TotalFiles, TotalDirs, TotalSize
- AddedFiles, ModifiedFiles, DeletedFiles
- Errors with messages
-
XOR Hash utilities:
XORHash()- Compute 8-byte fingerprint from dataComputeXORHashFromReader()- Streaming versionblockHash()- FNV-1a block hashing
4. ✅ Scan Handler Integration (pkg/worker/scan.go)
Updated scan handler to use the new components:
AzureClientFactoryfor dependency injectionRegisterShare()for per-share Azure configs- Integrated with
tree.Scanner - Progress logging during scans
5. ✅ Tests (pkg/storage/azure_test.go, pkg/tree/scanner_test.go)
Comprehensive test coverage:
- Mock Azure client tests (file operations, walking, errors, SkipDir)
- Config validation tests
- Path helper tests (basename, dirname)
- Scanner tests (empty share, new files, modifications)
- XOR hash tests (determinism, collision detection)
Blockers
🔲 Go Not Installed on Build System
The development environment doesn't have Go installed, so tests couldn't be run. The code is syntactically complete but not runtime-verified.
Action needed: Run tests on a system with Go 1.22+:
cd ~/dev/azure-backup
go mod tidy
go test ./...
🔲 Azure Credentials
Real Azure client implementation is stubbed - returns "Azure SDK not initialized" errors.
Action needed:
- Create Azure free trial account
- Create an Azure Storage account with Files service
- Create a file share for testing
- Add credentials to config
Architecture Notes
File Storage: 50-Byte DB Row
| Field | Size | Purpose |
|---|---|---|
| node_id | 8B | Unique identifier |
| parent_id | 8B | Tree structure |
| name | ~20B | Filename only |
| size | 8B | File size |
| mtime | 8B | Modified timestamp |
| xorhash | 8B | Change detection |
Everything else (ACLs, xattrs, content chunks) goes to object storage as FlatBuffers.
Change Detection Flow
- Azure reports file via ListDirectory
- Compare mtime/size with DB → quick diff
- If different, compute XOR hash → deep diff
- If hash differs, queue for backup
Tree Structure Benefits
- Rename directory: Update 1 row, not millions
- Move subtree: Update 1 parent_id
- Find descendants: Recursive CTE, properly indexed
- Storage: ~20 bytes vs 100+ bytes for full paths
File Structure
azure-backup/
├── ARCHITECTURE.md # Full design doc
├── schemas/
│ └── metadata.fbs # FlatBuffer schema ✅ NEW
├── pkg/
│ ├── db/
│ │ ├── node.go # Node type & interface
│ │ ├── postgres.go # PostgreSQL implementation
│ │ └── tree.go # Tree operations
│ ├── storage/
│ │ ├── azure.go # Azure Files client ✅ NEW
│ │ ├── azure_test.go # Tests ✅ NEW
│ │ ├── client.go # Object storage abstraction
│ │ ├── flatbuf.go # Metadata serialization
│ │ └── chunks.go # Content chunking
│ ├── tree/
│ │ ├── scanner.go # Azure scanner ✅ NEW
│ │ ├── scanner_test.go # Tests ✅ NEW
│ │ ├── walk.go # Tree walking
│ │ ├── diff.go # Tree diffing
│ │ └── path.go # Path reconstruction
│ └── worker/
│ ├── scan.go # Scan handler ✅ UPDATED
│ ├── queue.go # Job queue
│ ├── backup.go # Backup handler
│ └── restore.go # Restore handler
└── cmd/
└── backup-worker/
└── main.go # K8s worker binary
Next Steps
- Set up Go environment and run tests
- Azure account setup - free trial, storage account, file share
- Generate FlatBuffer code -
flatc --go -o pkg/storage/fb schemas/metadata.fbs - Implement real Azure SDK integration - uncomment TODOs in azure.go
- Content chunking - implement deduplication in pkg/storage/chunks.go
- End-to-end test - full scan → backup → restore cycle
Dependencies Added
github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.1.1
Code Quality
- All new code follows existing patterns
- Interface-first design for testability
- Mock implementations for offline testing
- Atomic counters for concurrent stats
- Error handling with context