6.7 KiB

Raw Blame History

Azure Files Backup POC - Progress Notes

Date: 2025-01-29 Repo: ~/dev/azure-backup

Summary

Continued development of the Azure Files Backup POC. Implemented Azure Files API client, tree scanner, FlatBuffer schema, and tests.

Completed Tasks

1. ✅ Azure Files API Client (`pkg/storage/azure.go`)

Created a comprehensive Azure Files client with:

AzureFilesClient interface - Main abstraction for Azure Files operations:
- ListDirectory() - List files/directories
- GetFile() - Download file content
- GetFileProperties() - Get file metadata without content
- GetFileRange() - Download specific byte range
- WalkDirectory() - Recursive directory traversal
- Close() - Cleanup
AzureConfig - Configuration with validation:
- Supports Shared Key auth (AccountName + AccountKey)
- Supports SAS token auth
- Supports connection string
MockAzureFilesClient - Full mock implementation for testing:
- AddFile(), AddDirectory(), SetFileContent(), SetError()
- All interface methods implemented with mock behavior
- Supports SkipDir for walk control

2. ✅ FlatBuffer Schema (`schemas/metadata.fbs`)

Designed comprehensive FlatBuffer schema with:

FileMetadata - Full file metadata (stored in object storage):
- Identity: node_id, parent_id, full_path, name
- Timestamps: mtime, ctime, atime
- POSIX: mode, owner, group
- Extended: ACL entries, xattrs
- Content: xorhash, content_hash, chunk references
- Versions: version history references
- Azure-specific: etag, content_md5, attributes
DirectoryMetadata - Directory-specific metadata with child counts
ScanResult - Scan operation results
BackupManifest - Backup snapshot metadata

3. ✅ Tree Scanner (`pkg/tree/scanner.go`)

Implemented the scanner that syncs Azure state to local DB:

Scanner - Main scanner type:
- Takes AzureFilesClient, NodeStore, and Logger
- Supports full and incremental scans
- Batch processing for efficiency
- Progress callbacks for monitoring
ScanOptions - Configurable scan behavior:
- RootPath, RootNodeID, FullScan
- BatchSize, Concurrency
- ProgressCallback, ProgressInterval
ScanStats - Atomic counters for metrics:
- TotalFiles, TotalDirs, TotalSize
- AddedFiles, ModifiedFiles, DeletedFiles
- Errors with messages
XOR Hash utilities:
- XORHash() - Compute 8-byte fingerprint from data
- ComputeXORHashFromReader() - Streaming version
- blockHash() - FNV-1a block hashing

4. ✅ Scan Handler Integration (`pkg/worker/scan.go`)

Updated scan handler to use the new components:

AzureClientFactory for dependency injection
RegisterShare() for per-share Azure configs
Integrated with tree.Scanner
Progress logging during scans

5. ✅ Tests (`pkg/storage/azure_test.go`, `pkg/tree/scanner_test.go`)

Comprehensive test coverage:

Mock Azure client tests (file operations, walking, errors, SkipDir)
Config validation tests
Path helper tests (basename, dirname)
Scanner tests (empty share, new files, modifications)
XOR hash tests (determinism, collision detection)

Blockers

🔲 Go Not Installed on Build System

The development environment doesn't have Go installed, so tests couldn't be run. The code is syntactically complete but not runtime-verified.

Action needed: Run tests on a system with Go 1.22+:

cd ~/dev/azure-backup
go mod tidy
go test ./...

🔲 Azure Credentials

Real Azure client implementation is stubbed - returns "Azure SDK not initialized" errors.

Action needed:

Create Azure free trial account
Create an Azure Storage account with Files service
Create a file share for testing
Add credentials to config

Architecture Notes

File Storage: 50-Byte DB Row

Field	Size	Purpose
node_id	8B	Unique identifier
parent_id	8B	Tree structure
name	~20B	Filename only
size	8B	File size
mtime	8B	Modified timestamp
xorhash	8B	Change detection

Everything else (ACLs, xattrs, content chunks) goes to object storage as FlatBuffers.

Change Detection Flow

Azure reports file via ListDirectory
Compare mtime/size with DB → quick diff
If different, compute XOR hash → deep diff
If hash differs, queue for backup

Tree Structure Benefits

Rename directory: Update 1 row, not millions
Move subtree: Update 1 parent_id
Find descendants: Recursive CTE, properly indexed
Storage: ~20 bytes vs 100+ bytes for full paths

File Structure

azure-backup/
├── ARCHITECTURE.md          # Full design doc
├── schemas/
│   └── metadata.fbs         # FlatBuffer schema ✅ NEW
├── pkg/
│   ├── db/
│   │   ├── node.go          # Node type & interface
│   │   ├── postgres.go      # PostgreSQL implementation
│   │   └── tree.go          # Tree operations
│   ├── storage/
│   │   ├── azure.go         # Azure Files client ✅ NEW
│   │   ├── azure_test.go    # Tests ✅ NEW
│   │   ├── client.go        # Object storage abstraction
│   │   ├── flatbuf.go       # Metadata serialization
│   │   └── chunks.go        # Content chunking
│   ├── tree/
│   │   ├── scanner.go       # Azure scanner ✅ NEW
│   │   ├── scanner_test.go  # Tests ✅ NEW
│   │   ├── walk.go          # Tree walking
│   │   ├── diff.go          # Tree diffing
│   │   └── path.go          # Path reconstruction
│   └── worker/
│       ├── scan.go          # Scan handler ✅ UPDATED
│       ├── queue.go         # Job queue
│       ├── backup.go        # Backup handler
│       └── restore.go       # Restore handler
└── cmd/
    └── backup-worker/
        └── main.go          # K8s worker binary

Next Steps

Set up Go environment and run tests
Azure account setup - free trial, storage account, file share
Generate FlatBuffer code - flatc --go -o pkg/storage/fb schemas/metadata.fbs
Implement real Azure SDK integration - uncomment TODOs in azure.go
Content chunking - implement deduplication in pkg/storage/chunks.go
End-to-end test - full scan → backup → restore cycle

Dependencies Added

github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.1.1

Code Quality

All new code follows existing patterns
Interface-first design for testability
Mock implementations for offline testing
Atomic counters for concurrent stats
Error handling with context

6.7 KiB Raw Blame History