inou/docs/soc2/disaster-recovery-plan.md

8.7 KiB

Disaster Recovery Plan

Version: 1.0 Effective: January 2026 Owner: Johan Jongsma Review: Annually Last DR Test: Not yet performed


1. Purpose

Define procedures to recover inou services and data following a disaster affecting production systems.


2. Scope

System Location Criticality
Production server 192.168.100.2 Critical
Production database /tank/inou/data/inou.db Critical
Master encryption key /tank/inou/master.key Critical
Staging server 192.168.1.253 Medium

3. Recovery Objectives

Metric Target
RTO (Recovery Time Objective) 4 hours
RPO (Recovery Point Objective) 24 hours

4. Backup Strategy

Backup Inventory

Data Method Frequency Retention Location
Database ZFS snapshot Daily 30 days Local (RAID-Z2)
Database rclone sync Daily 90 days Google Drive (encrypted)
Images ZFS snapshot Daily 30 days Local (RAID-Z2)
Images rclone sync Daily 90 days Google Drive (encrypted)
Master key Manual copy On change Permanent Proton Pass
Configuration Git repository Per change Permanent Local + remote

Encryption

All data is encrypted before leaving the server:

  • Database fields: AES-256-GCM encryption
  • Images: Stored encrypted
  • Off-site backups: Already encrypted; Google cannot read contents
  • Master key: Stored separately in Proton Pass (E2E encrypted)

ZFS Snapshot Management

# List available snapshots
zfs list -t snapshot tank/inou

# Create manual snapshot before major changes
zfs snapshot tank/inou@pre-change-$(date +%Y%m%d-%H%M)

# Snapshots are automatically created daily

5. Disaster Scenarios

Scenario A: Hardware Failure (Single Component)

Symptoms: Server unresponsive, disk errors, network failure

Recovery:

  1. Identify failed component
  2. Replace hardware
  3. Boot from existing ZFS pool or restore from snapshot
  4. Verify services: make test

Estimated time: 2-4 hours

Scenario B: Database Corruption

Symptoms: Application errors, SQLite integrity failures

Recovery:

# 1. Stop services
ssh johan@192.168.100.2 "sudo systemctl stop inou-portal inou-api"

# 2. Backup corrupted DB for analysis
ssh johan@192.168.100.2 "cp /tank/inou/data/inou.db /tank/inou/data/inou.db.corrupted"

# 3. List available snapshots
ssh johan@192.168.100.2 "zfs list -t snapshot tank/inou"

# 4. Restore from snapshot
ssh johan@192.168.100.2 "cp /tank/inou/.zfs/snapshot/<name>/data/inou.db /tank/inou/data/inou.db"

# 5. Restart services
ssh johan@192.168.100.2 "sudo systemctl start inou-portal inou-api"

# 6. Verify
make test

Estimated time: 1-2 hours

Scenario C: Complete Server Loss

Symptoms: Server destroyed, stolen, or unrecoverable

Recovery:

# 1. Provision new server with Ubuntu 24.04 LTS
# 2. Apply OS hardening (see security-policy.md)

# 3. Create directory structure
mkdir -p /tank/inou/{bin,data,static,templates,lang}

# 4. Restore master key from Proton Pass
# Copy 32-byte key to /tank/inou/master.key
chmod 600 /tank/inou/master.key

# 5. Restore database from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# 6. Restore images from Google Drive
rclone copy gdrive:inou-backup/images/ /tank/inou/data/images/

# 7. Clone application and build
cd ~/dev
git clone <repo> inou
cd inou
make build

# 8. Deploy
make deploy-prod

# 9. Update DNS if IP changed

# 10. Verify
make test

Estimated time: 4-8 hours

Scenario D: Ransomware/Compromise

Symptoms: Encrypted files, unauthorized access, system tampering

Recovery:

  1. Do not use compromised system - assume attacker persistence
  2. Provision fresh server from scratch
  3. Restore from known-good backup (before compromise date)
  4. Rotate master key and re-encrypt all data
  5. Rotate all credentials
  6. Apply additional hardening
  7. Monitor closely for re-compromise

Estimated time: 8-24 hours

Scenario E: Site Loss (Fire/Flood/Natural Disaster)

Symptoms: Physical location destroyed or inaccessible

Recovery:

  1. Obtain replacement hardware
  2. Restore from off-site backup (Google Drive)
  3. Restore master key from Proton Pass
  4. Rebuild and deploy application
  5. Update DNS to new IP

Estimated time: 24-48 hours


6. Key Management

Master Key Recovery

The master key (/tank/inou/master.key) is critical. Without it, all encrypted data is permanently unrecoverable.

Storage locations:

  1. Production server: /tank/inou/master.key
  2. Secure backup: Proton Pass (E2E encrypted, separate from data backups)

Recovery procedure:

  1. Log into Proton Pass
  2. Retrieve the 32-byte master key
  3. Create file: echo -n "<key>" > /tank/inou/master.key
  4. Set permissions: chmod 600 /tank/inou/master.key
  5. Verify length: wc -c /tank/inou/master.key (must be exactly 32 bytes)

Key Rotation (If Compromised)

If the master key may be compromised:

# 1. Generate new key
head -c 32 /dev/urandom > /tank/inou/master.key.new

# 2. Run re-encryption migration (script to be created)
# This decrypts all data with old key and re-encrypts with new key

# 3. Replace key
mv /tank/inou/master.key.new /tank/inou/master.key

# 4. Update Proton Pass with new key

# 5. Verify application functionality
make test

7. Recovery Procedures

Pre-Recovery Checklist

  • Incident documented and severity assessed
  • Stakeholders notified
  • Backup integrity verified
  • Recovery environment prepared
  • Master key accessible

Database Restore from ZFS

# Stop services
sudo systemctl stop inou-portal inou-api

# List snapshots
zfs list -t snapshot tank/inou

# Restore from snapshot
cp /tank/inou/.zfs/snapshot/<snapshot-name>/data/inou.db /tank/inou/data/inou.db

# Start services
sudo systemctl start inou-portal inou-api

# Verify
make test

Database Restore from Off-site

# Stop services
sudo systemctl stop inou-portal inou-api

# Download from Google Drive
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# Start services
sudo systemctl start inou-portal inou-api

# Verify
make test

8. Communication During Disaster

Audience Method Message
Users Email + status page "inou is experiencing technical difficulties. We expect to restore service by [time]."
Affected users Direct email Per incident response plan if data affected

9. Testing Schedule

Test Type Frequency Last Performed Next Due
Backup verification Monthly January 2026 February 2026
Database restore (local) Quarterly Not yet Q1 2026
Database restore (off-site) Quarterly Not yet Q1 2026
Full DR drill Annually Not yet Q4 2026

Backup Verification Procedure

# Monthly: Verify local snapshots exist and are readable
zfs list -t snapshot tank/inou
sqlite3 /tank/inou/.zfs/snapshot/<latest>/data/inou.db "SELECT COUNT(*) FROM dossiers"

# Monthly: Verify off-site backup exists
rclone ls gdrive:inou-backup/

Restore Test Procedure

# Quarterly: Restore to staging and verify

# 1. Copy from off-site to staging
rclone copy gdrive:inou-backup/inou.db /tmp/restore-test/

# 2. Verify database integrity
sqlite3 /tmp/restore-test/inou.db "PRAGMA integrity_check"

# 3. Verify data is readable (requires master key)
# Test decryption of sample records

# 4. Document results
# 5. Clean up test files
rm -rf /tmp/restore-test/

10. Post-Recovery Checklist

After any recovery:

  • All services operational (make test passes)
  • Data integrity verified (spot-check records)
  • Logs reviewed for errors
  • Users notified if there was visible outage
  • Incident documented
  • Post-mortem scheduled if significant event
  • This plan updated if gaps discovered

11. Quick Reference

Critical Paths

Item Path
Database /tank/inou/data/inou.db
Auth database /tank/inou/data/auth.db
Master key /tank/inou/master.key
Binaries /tank/inou/bin/
Logs /tank/inou/*.log

Service Commands

# Status
sudo systemctl status inou-portal inou-api

# Stop
sudo systemctl stop inou-portal inou-api

# Start
sudo systemctl start inou-portal inou-api

# Logs
journalctl -u inou-portal -f
journalctl -u inou-api -f

Off-site Backup Commands

# List remote backups
rclone ls gdrive:inou-backup/

# Download specific file
rclone copy gdrive:inou-backup/inou.db /tank/inou/data/

# Upload backup manually
rclone copy /tank/inou/data/inou.db gdrive:inou-backup/

Document end