8.7 KiB

Raw Blame History

Disaster Recovery Plan

Version: 1.0 Effective: January 2026 Owner: Johan Jongsma Review: Annually Last DR Test: Not yet performed

1. Purpose

Define procedures to recover inou services and data following a disaster affecting production systems.

2. Scope

System	Location	Criticality
Production server	192.168.100.2	Critical
Production database	/tank/inou/data/inou.db	Critical
Master encryption key	/tank/inou/master.key	Critical
Staging server	192.168.1.253	Medium

3. Recovery Objectives

Metric	Target
RTO (Recovery Time Objective)	4 hours
RPO (Recovery Point Objective)	24 hours

4. Backup Strategy

Backup Inventory

Data	Method	Frequency	Retention	Location
Database	ZFS snapshot	Daily	30 days	Local (RAID-Z2)
Database	rclone sync	Daily	90 days	Google Drive (encrypted)
Images	ZFS snapshot	Daily	30 days	Local (RAID-Z2)
Images	rclone sync	Daily	90 days	Google Drive (encrypted)
Master key	Manual copy	On change	Permanent	Proton Pass
Configuration	Git repository	Per change	Permanent	Local + remote

Encryption

All data is encrypted before leaving the server:

Database fields: AES-256-GCM encryption
Images: Stored encrypted
Off-site backups: Already encrypted; Google cannot read contents
Master key: Stored separately in Proton Pass (E2E encrypted)

ZFS Snapshot Management

# List available snapshots
zfs list -t snapshot tank/inou

# Create manual snapshot before major changes
zfs snapshot tank/inou@pre-change-$(date +%Y%m%d-%H%M)

# Snapshots are automatically created daily

5. Disaster Scenarios

Scenario A: Hardware Failure (Single Component)

Symptoms: Server unresponsive, disk errors, network failure

Recovery:

Identify failed component
Replace hardware
Boot from existing ZFS pool or restore from snapshot
Verify services: make test

Estimated time: 2-4 hours

Scenario B: Database Corruption

Symptoms: Application errors, SQLite integrity failures

Recovery:

# 1. Stop services
ssh johan@192.168.100.2 "sudo systemctl stop inou-portal inou-api"

# 2. Backup corrupted DB for analysis
ssh johan@192.168.100.2 "cp /tank/inou/data/inou.db /tank/inou/data/inou.db.corrupted"

# 3. List available snapshots
ssh johan@192.168.100.2 "zfs list -t snapshot tank/inou"

# 4. Restore from snapshot
ssh johan@192.168.100.2 "cp /tank/inou/.zfs/snapshot/<name>/data/inou.db /tank/inou/data/inou.db"

# 5. Restart services
ssh johan@192.168.100.2 "sudo systemctl start inou-portal inou-api"

# 6. Verify
make test

Estimated time: 1-2 hours

Scenario C: Complete Server Loss

Symptoms: Server destroyed, stolen, or unrecoverable

Recovery:

# 1. Provision new server with Ubuntu 24.04 LTS
# 2. Apply OS hardening (see security-policy.md)

# 3. Create directory structure
mkdir -p /tank/inou/{bin,data,static,templates,lang}

# 4. Restore master key from Proton Pass
# Copy 32-byte key to /tank/inou/master.key
chmod 600 /tank/inou/master.key

# 5. Restore database from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# 6. Restore images from Google Drive
rclone copy gdrive:inou-backup/images/ /tank/inou/data/images/

# 7. Clone application and build
cd ~/dev
git clone <repo> inou
cd inou
make build

# 8. Deploy
make deploy-prod

# 9. Update DNS if IP changed

# 10. Verify
make test

Estimated time: 4-8 hours

Scenario D: Ransomware/Compromise

Symptoms: Encrypted files, unauthorized access, system tampering

Recovery:

Do not use compromised system - assume attacker persistence
Provision fresh server from scratch
Restore from known-good backup (before compromise date)
Rotate master key and re-encrypt all data
Rotate all credentials
Apply additional hardening
Monitor closely for re-compromise

Estimated time: 8-24 hours

Scenario E: Site Loss (Fire/Flood/Natural Disaster)

Symptoms: Physical location destroyed or inaccessible

Recovery:

Obtain replacement hardware
Restore from off-site backup (Google Drive)
Restore master key from Proton Pass
Rebuild and deploy application
Update DNS to new IP

Estimated time: 24-48 hours

6. Key Management

Master Key Recovery

The master key (/tank/inou/master.key) is critical. Without it, all encrypted data is permanently unrecoverable.

Storage locations:

Production server: /tank/inou/master.key
Secure backup: Proton Pass (E2E encrypted, separate from data backups)

Recovery procedure:

Log into Proton Pass
Retrieve the 32-byte master key
Create file: echo -n "<key>" > /tank/inou/master.key
Set permissions: chmod 600 /tank/inou/master.key
Verify length: wc -c /tank/inou/master.key (must be exactly 32 bytes)

Key Rotation (If Compromised)

If the master key may be compromised:

# 1. Generate new key
head -c 32 /dev/urandom > /tank/inou/master.key.new

# 2. Run re-encryption migration (script to be created)
# This decrypts all data with old key and re-encrypts with new key

# 3. Replace key
mv /tank/inou/master.key.new /tank/inou/master.key

# 4. Update Proton Pass with new key

# 5. Verify application functionality
make test

7. Recovery Procedures

Pre-Recovery Checklist

Incident documented and severity assessed
Stakeholders notified
Backup integrity verified
Recovery environment prepared
Master key accessible

Database Restore from ZFS

# Stop services
sudo systemctl stop inou-portal inou-api

# List snapshots
zfs list -t snapshot tank/inou

# Restore from snapshot
cp /tank/inou/.zfs/snapshot/<snapshot-name>/data/inou.db /tank/inou/data/inou.db

# Start services
sudo systemctl start inou-portal inou-api

# Verify
make test

Database Restore from Off-site

# Stop services
sudo systemctl stop inou-portal inou-api

# Download from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# Start services
sudo systemctl start inou-portal inou-api

# Verify
make test

8. Communication During Disaster

Audience	Method	Message
Users	Email + status page	"inou is experiencing technical difficulties. We expect to restore service by [time]."
Affected users	Direct email	Per incident response plan if data affected

9. Testing Schedule

Test Type	Frequency	Last Performed	Next Due
Backup verification	Monthly	January 2026	February 2026
Database restore (local)	Quarterly	Not yet	Q1 2026
Database restore (off-site)	Quarterly	Not yet	Q1 2026
Full DR drill	Annually	Not yet	Q4 2026

Backup Verification Procedure

# Monthly: Verify local snapshots exist and are readable
zfs list -t snapshot tank/inou
sqlite3 /tank/inou/.zfs/snapshot/<latest>/data/inou.db "SELECT COUNT(*) FROM dossiers"

# Monthly: Verify off-site backup exists
rclone ls gdrive:inou-backup/

Restore Test Procedure

# Quarterly: Restore to staging and verify

# 1. Copy from off-site to staging
rclone copy gdrive:inou-backup/inou.db /tmp/restore-test/

# 2. Verify database integrity
sqlite3 /tmp/restore-test/inou.db "PRAGMA integrity_check"

# 3. Verify data is readable (requires master key)
# Test decryption of sample records

# 4. Document results
# 5. Clean up test files
rm -rf /tmp/restore-test/

10. Post-Recovery Checklist

After any recovery:

All services operational (make test passes)
Data integrity verified (spot-check records)
Logs reviewed for errors
Users notified if there was visible outage
Incident documented
Post-mortem scheduled if significant event
This plan updated if gaps discovered

11. Quick Reference

Critical Paths

Item	Path
Database	/tank/inou/data/inou.db
Auth database	/tank/inou/data/auth.db
Master key	/tank/inou/master.key
Binaries	/tank/inou/bin/
Logs	/tank/inou/*.log

Service Commands

# Status
sudo systemctl status inou-portal inou-api

# Stop
sudo systemctl stop inou-portal inou-api

# Start
sudo systemctl start inou-portal inou-api

# Logs
journalctl -u inou-portal -f
journalctl -u inou-api -f

Off-site Backup Commands

# List remote backups
rclone ls gdrive:inou-backup/

# Download specific file
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# Upload backup manually
rclone copy /tank/inou/data/inou.db gdrive:inou-backup/

Document end

8.7 KiB Raw Blame History

Disaster Recovery Plan

1. Purpose

2. Scope

3. Recovery Objectives

4. Backup Strategy

Backup Inventory

Encryption

ZFS Snapshot Management

5. Disaster Scenarios

Scenario A: Hardware Failure (Single Component)

Scenario B: Database Corruption

Scenario C: Complete Server Loss

Scenario D: Ransomware/Compromise

Scenario E: Site Loss (Fire/Flood/Natural Disaster)

6. Key Management

Master Key Recovery

Key Rotation (If Compromised)

7. Recovery Procedures

Pre-Recovery Checklist

Database Restore from ZFS

Database Restore from Off-site

8. Communication During Disaster

9. Testing Schedule

Backup Verification Procedure

Restore Test Procedure

10. Post-Recovery Checklist

11. Quick Reference

Critical Paths

Service Commands

Off-site Backup Commands

8.7 KiB

Raw Blame History