inou/docs/soc2/disaster-recovery-plan.md

379 lines
8.7 KiB
Markdown

# Disaster Recovery Plan
**Version:** 1.0
**Effective:** January 2026
**Owner:** Johan Jongsma
**Review:** Annually
**Last DR Test:** Not yet performed
---
## 1. Purpose
Define procedures to recover inou services and data following a disaster affecting production systems.
---
## 2. Scope
| System | Location | Criticality |
|--------|----------|-------------|
| Production server | 192.168.100.2 | Critical |
| Production database | /tank/inou/data/inou.db | Critical |
| Master encryption key | /tank/inou/master.key | Critical |
| Staging server | 192.168.1.253 | Medium |
---
## 3. Recovery Objectives
| Metric | Target |
|--------|--------|
| **RTO** (Recovery Time Objective) | 4 hours |
| **RPO** (Recovery Point Objective) | 24 hours |
---
## 4. Backup Strategy
### Backup Inventory
| Data | Method | Frequency | Retention | Location |
|------|--------|-----------|-----------|----------|
| Database | ZFS snapshot | Daily | 30 days | Local (RAID-Z2) |
| Database | rclone sync | Daily | 90 days | Google Drive (encrypted) |
| Images | ZFS snapshot | Daily | 30 days | Local (RAID-Z2) |
| Images | rclone sync | Daily | 90 days | Google Drive (encrypted) |
| Master key | Manual copy | On change | Permanent | Proton Pass |
| Configuration | Git repository | Per change | Permanent | Local + remote |
### Encryption
All data is encrypted before leaving the server:
- Database fields: AES-256-GCM encryption
- Images: Stored encrypted
- Off-site backups: Already encrypted; Google cannot read contents
- Master key: Stored separately in Proton Pass (E2E encrypted)
### ZFS Snapshot Management
```bash
# List available snapshots
zfs list -t snapshot tank/inou
# Create manual snapshot before major changes
zfs snapshot tank/inou@pre-change-$(date +%Y%m%d-%H%M)
# Snapshots are automatically created daily
```
---
## 5. Disaster Scenarios
### Scenario A: Hardware Failure (Single Component)
**Symptoms:** Server unresponsive, disk errors, network failure
**Recovery:**
1. Identify failed component
2. Replace hardware
3. Boot from existing ZFS pool or restore from snapshot
4. Verify services: `make test`
**Estimated time:** 2-4 hours
### Scenario B: Database Corruption
**Symptoms:** Application errors, SQLite integrity failures
**Recovery:**
```bash
# 1. Stop services
ssh johan@192.168.100.2 "sudo systemctl stop inou-portal inou-api"
# 2. Backup corrupted DB for analysis
ssh johan@192.168.100.2 "cp /tank/inou/data/inou.db /tank/inou/data/inou.db.corrupted"
# 3. List available snapshots
ssh johan@192.168.100.2 "zfs list -t snapshot tank/inou"
# 4. Restore from snapshot
ssh johan@192.168.100.2 "cp /tank/inou/.zfs/snapshot/<name>/data/inou.db /tank/inou/data/inou.db"
# 5. Restart services
ssh johan@192.168.100.2 "sudo systemctl start inou-portal inou-api"
# 6. Verify
make test
```
**Estimated time:** 1-2 hours
### Scenario C: Complete Server Loss
**Symptoms:** Server destroyed, stolen, or unrecoverable
**Recovery:**
```bash
# 1. Provision new server with Ubuntu 24.04 LTS
# 2. Apply OS hardening (see security-policy.md)
# 3. Create directory structure
mkdir -p /tank/inou/{bin,data,static,templates,lang}
# 4. Restore master key from Proton Pass
# Copy 32-byte key to /tank/inou/master.key
chmod 600 /tank/inou/master.key
# 5. Restore database from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
# 6. Restore images from Google Drive
rclone copy gdrive:inou-backup/images/ /tank/inou/data/images/
# 7. Clone application and build
cd ~/dev
git clone <repo> inou
cd inou
make build
# 8. Deploy
make deploy-prod
# 9. Update DNS if IP changed
# 10. Verify
make test
```
**Estimated time:** 4-8 hours
### Scenario D: Ransomware/Compromise
**Symptoms:** Encrypted files, unauthorized access, system tampering
**Recovery:**
1. **Do not use compromised system** - assume attacker persistence
2. Provision fresh server from scratch
3. Restore from known-good backup (before compromise date)
4. Rotate master key and re-encrypt all data
5. Rotate all credentials
6. Apply additional hardening
7. Monitor closely for re-compromise
**Estimated time:** 8-24 hours
### Scenario E: Site Loss (Fire/Flood/Natural Disaster)
**Symptoms:** Physical location destroyed or inaccessible
**Recovery:**
1. Obtain replacement hardware
2. Restore from off-site backup (Google Drive)
3. Restore master key from Proton Pass
4. Rebuild and deploy application
5. Update DNS to new IP
**Estimated time:** 24-48 hours
---
## 6. Key Management
### Master Key Recovery
The master key (`/tank/inou/master.key`) is **critical**. Without it, all encrypted data is permanently unrecoverable.
**Storage locations:**
1. Production server: `/tank/inou/master.key`
2. Secure backup: Proton Pass (E2E encrypted, separate from data backups)
**Recovery procedure:**
1. Log into Proton Pass
2. Retrieve the 32-byte master key
3. Create file: `echo -n "<key>" > /tank/inou/master.key`
4. Set permissions: `chmod 600 /tank/inou/master.key`
5. Verify length: `wc -c /tank/inou/master.key` (must be exactly 32 bytes)
### Key Rotation (If Compromised)
If the master key may be compromised:
```bash
# 1. Generate new key
head -c 32 /dev/urandom > /tank/inou/master.key.new
# 2. Run re-encryption migration (script to be created)
# This decrypts all data with old key and re-encrypts with new key
# 3. Replace key
mv /tank/inou/master.key.new /tank/inou/master.key
# 4. Update Proton Pass with new key
# 5. Verify application functionality
make test
```
---
## 7. Recovery Procedures
### Pre-Recovery Checklist
- [ ] Incident documented and severity assessed
- [ ] Stakeholders notified
- [ ] Backup integrity verified
- [ ] Recovery environment prepared
- [ ] Master key accessible
### Database Restore from ZFS
```bash
# Stop services
sudo systemctl stop inou-portal inou-api
# List snapshots
zfs list -t snapshot tank/inou
# Restore from snapshot
cp /tank/inou/.zfs/snapshot/<snapshot-name>/data/inou.db /tank/inou/data/inou.db
# Start services
sudo systemctl start inou-portal inou-api
# Verify
make test
```
### Database Restore from Off-site
```bash
# Stop services
sudo systemctl stop inou-portal inou-api
# Download from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
# Start services
sudo systemctl start inou-portal inou-api
# Verify
make test
```
---
## 8. Communication During Disaster
| Audience | Method | Message |
|----------|--------|---------|
| Users | Email + status page | "inou is experiencing technical difficulties. We expect to restore service by [time]." |
| Affected users | Direct email | Per incident response plan if data affected |
---
## 9. Testing Schedule
| Test Type | Frequency | Last Performed | Next Due |
|-----------|-----------|----------------|----------|
| Backup verification | Monthly | January 2026 | February 2026 |
| Database restore (local) | Quarterly | Not yet | Q1 2026 |
| Database restore (off-site) | Quarterly | Not yet | Q1 2026 |
| Full DR drill | Annually | Not yet | Q4 2026 |
### Backup Verification Procedure
```bash
# Monthly: Verify local snapshots exist and are readable
zfs list -t snapshot tank/inou
sqlite3 /tank/inou/.zfs/snapshot/<latest>/data/inou.db "SELECT COUNT(*) FROM dossiers"
# Monthly: Verify off-site backup exists
rclone ls gdrive:inou-backup/
```
### Restore Test Procedure
```bash
# Quarterly: Restore to staging and verify
# 1. Copy from off-site to staging
rclone copy gdrive:inou-backup/inou.db /tmp/restore-test/
# 2. Verify database integrity
sqlite3 /tmp/restore-test/inou.db "PRAGMA integrity_check"
# 3. Verify data is readable (requires master key)
# Test decryption of sample records
# 4. Document results
# 5. Clean up test files
rm -rf /tmp/restore-test/
```
---
## 10. Post-Recovery Checklist
After any recovery:
- [ ] All services operational (`make test` passes)
- [ ] Data integrity verified (spot-check records)
- [ ] Logs reviewed for errors
- [ ] Users notified if there was visible outage
- [ ] Incident documented
- [ ] Post-mortem scheduled if significant event
- [ ] This plan updated if gaps discovered
---
## 11. Quick Reference
### Critical Paths
| Item | Path |
|------|------|
| Database | /tank/inou/data/inou.db |
| Auth database | /tank/inou/data/auth.db |
| Master key | /tank/inou/master.key |
| Binaries | /tank/inou/bin/ |
| Logs | /tank/inou/*.log |
### Service Commands
```bash
# Status
sudo systemctl status inou-portal inou-api
# Stop
sudo systemctl stop inou-portal inou-api
# Start
sudo systemctl start inou-portal inou-api
# Logs
journalctl -u inou-portal -f
journalctl -u inou-api -f
```
### Off-site Backup Commands
```bash
# List remote backups
rclone ls gdrive:inou-backup/
# Download specific file
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/
# Upload backup manually
rclone copy /tank/inou/data/inou.db gdrive:inou-backup/
```
---
*Document end*