379 lines
8.7 KiB
Markdown
379 lines
8.7 KiB
Markdown
# Disaster Recovery Plan
|
|
|
|
**Version:** 1.0
|
|
**Effective:** January 2026
|
|
**Owner:** Johan Jongsma
|
|
**Review:** Annually
|
|
**Last DR Test:** Not yet performed
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
Define procedures to recover inou services and data following a disaster affecting production systems.
|
|
|
|
---
|
|
|
|
## 2. Scope
|
|
|
|
| System | Location | Criticality |
|
|
|--------|----------|-------------|
|
|
| Production server | 192.168.100.2 | Critical |
|
|
| Production database | /tank/inou/data/inou.db | Critical |
|
|
| Master encryption key | /tank/inou/master.key | Critical |
|
|
| Staging server | 192.168.1.253 | Medium |
|
|
|
|
---
|
|
|
|
## 3. Recovery Objectives
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| **RTO** (Recovery Time Objective) | 4 hours |
|
|
| **RPO** (Recovery Point Objective) | 24 hours |
|
|
|
|
---
|
|
|
|
## 4. Backup Strategy
|
|
|
|
### Backup Inventory
|
|
|
|
| Data | Method | Frequency | Retention | Location |
|
|
|------|--------|-----------|-----------|----------|
|
|
| Database | ZFS snapshot | Daily | 30 days | Local (RAID-Z2) |
|
|
| Database | rclone sync | Daily | 90 days | Google Drive (encrypted) |
|
|
| Images | ZFS snapshot | Daily | 30 days | Local (RAID-Z2) |
|
|
| Images | rclone sync | Daily | 90 days | Google Drive (encrypted) |
|
|
| Master key | Manual copy | On change | Permanent | Proton Pass |
|
|
| Configuration | Git repository | Per change | Permanent | Local + remote |
|
|
|
|
### Encryption
|
|
|
|
All data is encrypted before leaving the server:
|
|
- Database fields: AES-256-GCM encryption
|
|
- Images: Stored encrypted
|
|
- Off-site backups: Already encrypted; Google cannot read contents
|
|
- Master key: Stored separately in Proton Pass (E2E encrypted)
|
|
|
|
### ZFS Snapshot Management
|
|
|
|
```bash
|
|
# List available snapshots
|
|
zfs list -t snapshot tank/inou
|
|
|
|
# Create manual snapshot before major changes
|
|
zfs snapshot tank/inou@pre-change-$(date +%Y%m%d-%H%M)
|
|
|
|
# Snapshots are automatically created daily
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Disaster Scenarios
|
|
|
|
### Scenario A: Hardware Failure (Single Component)
|
|
|
|
**Symptoms:** Server unresponsive, disk errors, network failure
|
|
|
|
**Recovery:**
|
|
1. Identify failed component
|
|
2. Replace hardware
|
|
3. Boot from existing ZFS pool or restore from snapshot
|
|
4. Verify services: `make test`
|
|
|
|
**Estimated time:** 2-4 hours
|
|
|
|
### Scenario B: Database Corruption
|
|
|
|
**Symptoms:** Application errors, SQLite integrity failures
|
|
|
|
**Recovery:**
|
|
|
|
```bash
|
|
# 1. Stop services
|
|
ssh johan@192.168.100.2 "sudo systemctl stop inou-portal inou-api"
|
|
|
|
# 2. Backup corrupted DB for analysis
|
|
ssh johan@192.168.100.2 "cp /tank/inou/data/inou.db /tank/inou/data/inou.db.corrupted"
|
|
|
|
# 3. List available snapshots
|
|
ssh johan@192.168.100.2 "zfs list -t snapshot tank/inou"
|
|
|
|
# 4. Restore from snapshot
|
|
ssh johan@192.168.100.2 "cp /tank/inou/.zfs/snapshot/<name>/data/inou.db /tank/inou/data/inou.db"
|
|
|
|
# 5. Restart services
|
|
ssh johan@192.168.100.2 "sudo systemctl start inou-portal inou-api"
|
|
|
|
# 6. Verify
|
|
make test
|
|
```
|
|
|
|
**Estimated time:** 1-2 hours
|
|
|
|
### Scenario C: Complete Server Loss
|
|
|
|
**Symptoms:** Server destroyed, stolen, or unrecoverable
|
|
|
|
**Recovery:**
|
|
|
|
```bash
|
|
# 1. Provision new server with Ubuntu 24.04 LTS
|
|
# 2. Apply OS hardening (see security-policy.md)
|
|
|
|
# 3. Create directory structure
|
|
mkdir -p /tank/inou/{bin,data,static,templates,lang}
|
|
|
|
# 4. Restore master key from Proton Pass
|
|
# Copy 32-byte key to /tank/inou/master.key
|
|
chmod 600 /tank/inou/master.key
|
|
|
|
# 5. Restore database from Google Drive
|
|
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
|
|
|
|
# 6. Restore images from Google Drive
|
|
rclone copy gdrive:inou-backup/images/ /tank/inou/data/images/
|
|
|
|
# 7. Clone application and build
|
|
cd ~/dev
|
|
git clone <repo> inou
|
|
cd inou
|
|
make build
|
|
|
|
# 8. Deploy
|
|
make deploy-prod
|
|
|
|
# 9. Update DNS if IP changed
|
|
|
|
# 10. Verify
|
|
make test
|
|
```
|
|
|
|
**Estimated time:** 4-8 hours
|
|
|
|
### Scenario D: Ransomware/Compromise
|
|
|
|
**Symptoms:** Encrypted files, unauthorized access, system tampering
|
|
|
|
**Recovery:**
|
|
1. **Do not use compromised system** - assume attacker persistence
|
|
2. Provision fresh server from scratch
|
|
3. Restore from known-good backup (before compromise date)
|
|
4. Rotate master key and re-encrypt all data
|
|
5. Rotate all credentials
|
|
6. Apply additional hardening
|
|
7. Monitor closely for re-compromise
|
|
|
|
**Estimated time:** 8-24 hours
|
|
|
|
### Scenario E: Site Loss (Fire/Flood/Natural Disaster)
|
|
|
|
**Symptoms:** Physical location destroyed or inaccessible
|
|
|
|
**Recovery:**
|
|
1. Obtain replacement hardware
|
|
2. Restore from off-site backup (Google Drive)
|
|
3. Restore master key from Proton Pass
|
|
4. Rebuild and deploy application
|
|
5. Update DNS to new IP
|
|
|
|
**Estimated time:** 24-48 hours
|
|
|
|
---
|
|
|
|
## 6. Key Management
|
|
|
|
### Master Key Recovery
|
|
|
|
The master key (`/tank/inou/master.key`) is **critical**. Without it, all encrypted data is permanently unrecoverable.
|
|
|
|
**Storage locations:**
|
|
1. Production server: `/tank/inou/master.key`
|
|
2. Secure backup: Proton Pass (E2E encrypted, separate from data backups)
|
|
|
|
**Recovery procedure:**
|
|
1. Log into Proton Pass
|
|
2. Retrieve the 32-byte master key
|
|
3. Create file: `echo -n "<key>" > /tank/inou/master.key`
|
|
4. Set permissions: `chmod 600 /tank/inou/master.key`
|
|
5. Verify length: `wc -c /tank/inou/master.key` (must be exactly 32 bytes)
|
|
|
|
### Key Rotation (If Compromised)
|
|
|
|
If the master key may be compromised:
|
|
|
|
```bash
|
|
# 1. Generate new key
|
|
head -c 32 /dev/urandom > /tank/inou/master.key.new
|
|
|
|
# 2. Run re-encryption migration (script to be created)
|
|
# This decrypts all data with old key and re-encrypts with new key
|
|
|
|
# 3. Replace key
|
|
mv /tank/inou/master.key.new /tank/inou/master.key
|
|
|
|
# 4. Update Proton Pass with new key
|
|
|
|
# 5. Verify application functionality
|
|
make test
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Recovery Procedures
|
|
|
|
### Pre-Recovery Checklist
|
|
|
|
- [ ] Incident documented and severity assessed
|
|
- [ ] Stakeholders notified
|
|
- [ ] Backup integrity verified
|
|
- [ ] Recovery environment prepared
|
|
- [ ] Master key accessible
|
|
|
|
### Database Restore from ZFS
|
|
|
|
```bash
|
|
# Stop services
|
|
sudo systemctl stop inou-portal inou-api
|
|
|
|
# List snapshots
|
|
zfs list -t snapshot tank/inou
|
|
|
|
# Restore from snapshot
|
|
cp /tank/inou/.zfs/snapshot/<snapshot-name>/data/inou.db /tank/inou/data/inou.db
|
|
|
|
# Start services
|
|
sudo systemctl start inou-portal inou-api
|
|
|
|
# Verify
|
|
make test
|
|
```
|
|
|
|
### Database Restore from Off-site
|
|
|
|
```bash
|
|
# Stop services
|
|
sudo systemctl stop inou-portal inou-api
|
|
|
|
# Download from Google Drive
|
|
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
|
|
|
|
# Start services
|
|
sudo systemctl start inou-portal inou-api
|
|
|
|
# Verify
|
|
make test
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Communication During Disaster
|
|
|
|
| Audience | Method | Message |
|
|
|----------|--------|---------|
|
|
| Users | Email + status page | "inou is experiencing technical difficulties. We expect to restore service by [time]." |
|
|
| Affected users | Direct email | Per incident response plan if data affected |
|
|
|
|
---
|
|
|
|
## 9. Testing Schedule
|
|
|
|
| Test Type | Frequency | Last Performed | Next Due |
|
|
|-----------|-----------|----------------|----------|
|
|
| Backup verification | Monthly | January 2026 | February 2026 |
|
|
| Database restore (local) | Quarterly | Not yet | Q1 2026 |
|
|
| Database restore (off-site) | Quarterly | Not yet | Q1 2026 |
|
|
| Full DR drill | Annually | Not yet | Q4 2026 |
|
|
|
|
### Backup Verification Procedure
|
|
|
|
```bash
|
|
# Monthly: Verify local snapshots exist and are readable
|
|
zfs list -t snapshot tank/inou
|
|
sqlite3 /tank/inou/.zfs/snapshot/<latest>/data/inou.db "SELECT COUNT(*) FROM dossiers"
|
|
|
|
# Monthly: Verify off-site backup exists
|
|
rclone ls gdrive:inou-backup/
|
|
```
|
|
|
|
### Restore Test Procedure
|
|
|
|
```bash
|
|
# Quarterly: Restore to staging and verify
|
|
|
|
# 1. Copy from off-site to staging
|
|
rclone copy gdrive:inou-backup/inou.db /tmp/restore-test/
|
|
|
|
# 2. Verify database integrity
|
|
sqlite3 /tmp/restore-test/inou.db "PRAGMA integrity_check"
|
|
|
|
# 3. Verify data is readable (requires master key)
|
|
# Test decryption of sample records
|
|
|
|
# 4. Document results
|
|
# 5. Clean up test files
|
|
rm -rf /tmp/restore-test/
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Post-Recovery Checklist
|
|
|
|
After any recovery:
|
|
|
|
- [ ] All services operational (`make test` passes)
|
|
- [ ] Data integrity verified (spot-check records)
|
|
- [ ] Logs reviewed for errors
|
|
- [ ] Users notified if there was visible outage
|
|
- [ ] Incident documented
|
|
- [ ] Post-mortem scheduled if significant event
|
|
- [ ] This plan updated if gaps discovered
|
|
|
|
---
|
|
|
|
## 11. Quick Reference
|
|
|
|
### Critical Paths
|
|
|
|
| Item | Path |
|
|
|------|------|
|
|
| Database | /tank/inou/data/inou.db |
|
|
| Auth database | /tank/inou/data/auth.db |
|
|
| Master key | /tank/inou/master.key |
|
|
| Binaries | /tank/inou/bin/ |
|
|
| Logs | /tank/inou/*.log |
|
|
|
|
### Service Commands
|
|
|
|
```bash
|
|
# Status
|
|
sudo systemctl status inou-portal inou-api
|
|
|
|
# Stop
|
|
sudo systemctl stop inou-portal inou-api
|
|
|
|
# Start
|
|
sudo systemctl start inou-portal inou-api
|
|
|
|
# Logs
|
|
journalctl -u inou-portal -f
|
|
journalctl -u inou-api -f
|
|
```
|
|
|
|
### Off-site Backup Commands
|
|
|
|
```bash
|
|
# List remote backups
|
|
rclone ls gdrive:inou-backup/
|
|
|
|
# Download specific file
|
|
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
|
|
|
|
# Upload backup manually
|
|
rclone copy /tank/inou/data/inou.db gdrive:inou-backup/
|
|
```
|
|
|
|
---
|
|
|
|
*Document end*
|