203 lines
6.7 KiB
Markdown
203 lines
6.7 KiB
Markdown
# Azure Files Backup POC - Progress Notes
|
|
|
|
**Date:** 2025-01-29
|
|
**Repo:** ~/dev/azure-backup
|
|
|
|
## Summary
|
|
|
|
Continued development of the Azure Files Backup POC. Implemented Azure Files API client, tree scanner, FlatBuffer schema, and tests.
|
|
|
|
## Completed Tasks
|
|
|
|
### 1. ✅ Azure Files API Client (`pkg/storage/azure.go`)
|
|
|
|
Created a comprehensive Azure Files client with:
|
|
|
|
- **`AzureFilesClient` interface** - Main abstraction for Azure Files operations:
|
|
- `ListDirectory()` - List files/directories
|
|
- `GetFile()` - Download file content
|
|
- `GetFileProperties()` - Get file metadata without content
|
|
- `GetFileRange()` - Download specific byte range
|
|
- `WalkDirectory()` - Recursive directory traversal
|
|
- `Close()` - Cleanup
|
|
|
|
- **`AzureConfig`** - Configuration with validation:
|
|
- Supports Shared Key auth (`AccountName` + `AccountKey`)
|
|
- Supports SAS token auth
|
|
- Supports connection string
|
|
|
|
- **`MockAzureFilesClient`** - Full mock implementation for testing:
|
|
- `AddFile()`, `AddDirectory()`, `SetFileContent()`, `SetError()`
|
|
- All interface methods implemented with mock behavior
|
|
- Supports `SkipDir` for walk control
|
|
|
|
### 2. ✅ FlatBuffer Schema (`schemas/metadata.fbs`)
|
|
|
|
Designed comprehensive FlatBuffer schema with:
|
|
|
|
- **`FileMetadata`** - Full file metadata (stored in object storage):
|
|
- Identity: node_id, parent_id, full_path, name
|
|
- Timestamps: mtime, ctime, atime
|
|
- POSIX: mode, owner, group
|
|
- Extended: ACL entries, xattrs
|
|
- Content: xorhash, content_hash, chunk references
|
|
- Versions: version history references
|
|
- Azure-specific: etag, content_md5, attributes
|
|
|
|
- **`DirectoryMetadata`** - Directory-specific metadata with child counts
|
|
|
|
- **`ScanResult`** - Scan operation results
|
|
|
|
- **`BackupManifest`** - Backup snapshot metadata
|
|
|
|
### 3. ✅ Tree Scanner (`pkg/tree/scanner.go`)
|
|
|
|
Implemented the scanner that syncs Azure state to local DB:
|
|
|
|
- **`Scanner`** - Main scanner type:
|
|
- Takes `AzureFilesClient`, `NodeStore`, and `Logger`
|
|
- Supports full and incremental scans
|
|
- Batch processing for efficiency
|
|
- Progress callbacks for monitoring
|
|
|
|
- **`ScanOptions`** - Configurable scan behavior:
|
|
- `RootPath`, `RootNodeID`, `FullScan`
|
|
- `BatchSize`, `Concurrency`
|
|
- `ProgressCallback`, `ProgressInterval`
|
|
|
|
- **`ScanStats`** - Atomic counters for metrics:
|
|
- TotalFiles, TotalDirs, TotalSize
|
|
- AddedFiles, ModifiedFiles, DeletedFiles
|
|
- Errors with messages
|
|
|
|
- **XOR Hash utilities**:
|
|
- `XORHash()` - Compute 8-byte fingerprint from data
|
|
- `ComputeXORHashFromReader()` - Streaming version
|
|
- `blockHash()` - FNV-1a block hashing
|
|
|
|
### 4. ✅ Scan Handler Integration (`pkg/worker/scan.go`)
|
|
|
|
Updated scan handler to use the new components:
|
|
|
|
- `AzureClientFactory` for dependency injection
|
|
- `RegisterShare()` for per-share Azure configs
|
|
- Integrated with `tree.Scanner`
|
|
- Progress logging during scans
|
|
|
|
### 5. ✅ Tests (`pkg/storage/azure_test.go`, `pkg/tree/scanner_test.go`)
|
|
|
|
Comprehensive test coverage:
|
|
|
|
- Mock Azure client tests (file operations, walking, errors, SkipDir)
|
|
- Config validation tests
|
|
- Path helper tests (basename, dirname)
|
|
- Scanner tests (empty share, new files, modifications)
|
|
- XOR hash tests (determinism, collision detection)
|
|
|
|
## Blockers
|
|
|
|
### 🔲 Go Not Installed on Build System
|
|
|
|
The development environment doesn't have Go installed, so tests couldn't be run. The code is syntactically complete but not runtime-verified.
|
|
|
|
**Action needed:** Run tests on a system with Go 1.22+:
|
|
```bash
|
|
cd ~/dev/azure-backup
|
|
go mod tidy
|
|
go test ./...
|
|
```
|
|
|
|
### 🔲 Azure Credentials
|
|
|
|
Real Azure client implementation is stubbed - returns "Azure SDK not initialized" errors.
|
|
|
|
**Action needed:**
|
|
1. Create Azure free trial account
|
|
2. Create an Azure Storage account with Files service
|
|
3. Create a file share for testing
|
|
4. Add credentials to config
|
|
|
|
## Architecture Notes
|
|
|
|
### File Storage: 50-Byte DB Row
|
|
|
|
| Field | Size | Purpose |
|
|
|-----------|-------|-----------------------|
|
|
| node_id | 8B | Unique identifier |
|
|
| parent_id | 8B | Tree structure |
|
|
| name | ~20B | Filename only |
|
|
| size | 8B | File size |
|
|
| mtime | 8B | Modified timestamp |
|
|
| xorhash | 8B | Change detection |
|
|
|
|
Everything else (ACLs, xattrs, content chunks) goes to object storage as FlatBuffers.
|
|
|
|
### Change Detection Flow
|
|
|
|
1. Azure reports file via ListDirectory
|
|
2. Compare mtime/size with DB → quick diff
|
|
3. If different, compute XOR hash → deep diff
|
|
4. If hash differs, queue for backup
|
|
|
|
### Tree Structure Benefits
|
|
|
|
- **Rename directory:** Update 1 row, not millions
|
|
- **Move subtree:** Update 1 parent_id
|
|
- **Find descendants:** Recursive CTE, properly indexed
|
|
- **Storage:** ~20 bytes vs 100+ bytes for full paths
|
|
|
|
## File Structure
|
|
|
|
```
|
|
azure-backup/
|
|
├── ARCHITECTURE.md # Full design doc
|
|
├── schemas/
|
|
│ └── metadata.fbs # FlatBuffer schema ✅ NEW
|
|
├── pkg/
|
|
│ ├── db/
|
|
│ │ ├── node.go # Node type & interface
|
|
│ │ ├── postgres.go # PostgreSQL implementation
|
|
│ │ └── tree.go # Tree operations
|
|
│ ├── storage/
|
|
│ │ ├── azure.go # Azure Files client ✅ NEW
|
|
│ │ ├── azure_test.go # Tests ✅ NEW
|
|
│ │ ├── client.go # Object storage abstraction
|
|
│ │ ├── flatbuf.go # Metadata serialization
|
|
│ │ └── chunks.go # Content chunking
|
|
│ ├── tree/
|
|
│ │ ├── scanner.go # Azure scanner ✅ NEW
|
|
│ │ ├── scanner_test.go # Tests ✅ NEW
|
|
│ │ ├── walk.go # Tree walking
|
|
│ │ ├── diff.go # Tree diffing
|
|
│ │ └── path.go # Path reconstruction
|
|
│ └── worker/
|
|
│ ├── scan.go # Scan handler ✅ UPDATED
|
|
│ ├── queue.go # Job queue
|
|
│ ├── backup.go # Backup handler
|
|
│ └── restore.go # Restore handler
|
|
└── cmd/
|
|
└── backup-worker/
|
|
└── main.go # K8s worker binary
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Set up Go environment** and run tests
|
|
2. **Azure account setup** - free trial, storage account, file share
|
|
3. **Generate FlatBuffer code** - `flatc --go -o pkg/storage/fb schemas/metadata.fbs`
|
|
4. **Implement real Azure SDK integration** - uncomment TODOs in azure.go
|
|
5. **Content chunking** - implement deduplication in pkg/storage/chunks.go
|
|
6. **End-to-end test** - full scan → backup → restore cycle
|
|
|
|
## Dependencies Added
|
|
|
|
- `github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.1.1`
|
|
|
|
## Code Quality
|
|
|
|
- All new code follows existing patterns
|
|
- Interface-first design for testability
|
|
- Mock implementations for offline testing
|
|
- Atomic counters for concurrent stats
|
|
- Error handling with context
|