dealspace/docs/soc2/disaster-recovery-plan.md

301 lines
6.9 KiB
Markdown

# Disaster Recovery Plan
**Version:** 1.0
**Effective:** February 2026
**Owner:** Johan Jongsma
**Review:** Annually
**Last DR Test:** Not yet performed
---
## 1. Purpose
Define procedures to recover Dealspace services and data following a disaster affecting production systems.
---
## 2. Scope
| System | Location | Criticality |
|--------|----------|-------------|
| Production server | 82.24.174.112 (Zürich) | Critical |
| Database | /opt/dealspace/data/dealspace.db | Critical |
| Master encryption key | Secure storage | Critical |
---
## 3. Recovery Objectives
| Metric | Target |
|--------|--------|
| **RTO** (Recovery Time Objective) | 4 hours |
| **RPO** (Recovery Point Objective) | 24 hours |
---
## 4. Backup Strategy
### Backup Inventory
| Data | Method | Frequency | Retention | Location |
|------|--------|-----------|-----------|----------|
| Database | SQLite backup | Daily | 30 days | Encrypted off-site |
| Master key | Manual copy | On change | Permanent | Separate secure storage |
| Configuration | Git repository | Per change | Permanent | Remote repository |
### Encryption
All data is encrypted before leaving the server:
- Database fields: AES-256-GCM encryption with per-project keys
- Off-site backups: Already encrypted
- Master key: Stored separately from data backups
---
## 5. Disaster Scenarios
### Scenario A: Hardware Failure (Single Component)
**Symptoms:** Server unresponsive, network failure
**Recovery:**
1. Contact Hostkey support
2. Restore from backup to new VPS if needed
3. Verify services: health check endpoint
4. Update DNS if IP changed
**Estimated time:** 2-4 hours
### Scenario B: Database Corruption
**Symptoms:** Application errors, SQLite integrity failures
**Recovery:**
```bash
# 1. Stop services
ssh root@82.24.174.112 "systemctl stop dealspace"
# 2. Backup corrupted DB for analysis
ssh root@82.24.174.112 "cp /opt/dealspace/data/dealspace.db /opt/dealspace/data/dealspace.db.corrupted"
# 3. Restore from backup
# Download latest backup and restore
scp backup-server:/backups/dealspace-latest.db.enc /tmp/
# Decrypt and place in position
# 4. Restart services
ssh root@82.24.174.112 "systemctl start dealspace"
# 5. Verify
curl -s https://muskepo.com/health
```
**Estimated time:** 1-2 hours
### Scenario C: Complete Server Loss
**Symptoms:** Server destroyed, stolen, or unrecoverable
**Recovery:**
```bash
# 1. Provision new VPS at Hostkey
# 2. Apply OS hardening (see security-policy.md)
# 3. Create directory structure
mkdir -p /opt/dealspace/{bin,data}
# 4. Restore master key from secure storage
# Copy 32-byte key to secure location
chmod 600 /opt/dealspace/master.key
# 5. Restore database from backup
# Download encrypted backup
# Decrypt and place at /opt/dealspace/data/dealspace.db
# 6. Deploy application binary
scp dealspace-linux root@NEW_IP:/opt/dealspace/bin/dealspace
chmod +x /opt/dealspace/bin/dealspace
# 7. Configure systemd service
# 8. Start service
# 9. Update DNS to new IP
# 10. Verify
curl -s https://muskepo.com/health
```
**Estimated time:** 4-8 hours
### Scenario D: Ransomware/Compromise
**Symptoms:** Encrypted files, unauthorized access, system tampering
**Recovery:**
1. **Do not use compromised system** - assume attacker persistence
2. Provision fresh VPS from scratch
3. Restore from known-good backup (before compromise date)
4. Rotate master key and re-encrypt all data
5. Rotate all credentials
6. Apply additional hardening
7. Monitor closely for re-compromise
**Estimated time:** 8-24 hours
### Scenario E: Provider/Region Loss
**Symptoms:** Hostkey Zürich unavailable
**Recovery:**
1. Provision new VPS at alternate provider
2. Restore from off-site backup
3. Restore master key from secure storage
4. Deploy application
5. Update DNS
**Estimated time:** 24-48 hours
---
## 6. Key Management
### Master Key Recovery
The master key is **critical**. Without it, all encrypted data is permanently unrecoverable.
**Storage locations:**
1. Production server: Secure location
2. Secure backup: Separate secure storage (not with data backups)
**Recovery procedure:**
1. Retrieve the 32-byte master key from secure storage
2. Create file with proper permissions
3. Verify length (must be exactly 32 bytes)
### Key Rotation (If Compromised)
If the master key may be compromised:
1. Generate new master key
2. Run re-encryption migration (decrypt with old key, re-encrypt with new)
3. Replace key file
4. Update secure storage with new key
5. Verify application functionality
---
## 7. Recovery Procedures
### Pre-Recovery Checklist
- [ ] Incident documented and severity assessed
- [ ] Stakeholders notified
- [ ] Backup integrity verified
- [ ] Recovery environment prepared
- [ ] Master key accessible
### Database Restore from Backup
```bash
# Stop services
ssh root@82.24.174.112 "systemctl stop dealspace"
# Download and decrypt backup
# Place at /opt/dealspace/data/dealspace.db
# Start services
ssh root@82.24.174.112 "systemctl start dealspace"
# Verify
curl -s https://muskepo.com/health
```
---
## 8. Communication During Disaster
| Audience | Method | Message |
|----------|--------|---------|
| Clients | Email + status page | "Dealspace is experiencing technical difficulties. We expect to restore service by [time]." |
| Affected clients | Direct email | Per incident response plan if data affected |
---
## 9. Testing Schedule
| Test Type | Frequency | Last Performed | Next Due |
|-----------|-----------|----------------|----------|
| Backup verification | Monthly | Not yet | March 2026 |
| Database restore | Quarterly | Not yet | Q1 2026 |
| Full DR drill | Annually | Not yet | Q4 2026 |
### Backup Verification Procedure
```bash
# Monthly: Verify backups exist and are readable
# List available backups
# Verify database integrity of latest backup
```
### Restore Test Procedure
```bash
# Quarterly: Restore to test environment and verify
# 1. Download backup to test environment
# 2. Verify database integrity: sqlite3 test.db "PRAGMA integrity_check"
# 3. Verify data is readable (requires master key)
# 4. Document results
# 5. Clean up test files
```
---
## 10. Post-Recovery Checklist
After any recovery:
- [ ] All services operational (health check passes)
- [ ] Data integrity verified (spot-check records)
- [ ] Logs reviewed for errors
- [ ] Clients notified if there was visible outage
- [ ] Incident documented
- [ ] Post-mortem scheduled if significant event
- [ ] This plan updated if gaps discovered
---
## 11. Quick Reference
### Critical Paths
| Item | Path |
|------|------|
| Database | /opt/dealspace/data/dealspace.db |
| Binary | /opt/dealspace/bin/dealspace |
| Master key | Secure location |
### Service Commands
```bash
# Status
ssh root@82.24.174.112 "systemctl status dealspace"
# Stop
ssh root@82.24.174.112 "systemctl stop dealspace"
# Start
ssh root@82.24.174.112 "systemctl start dealspace"
# Logs
ssh root@82.24.174.112 "journalctl -u dealspace -f"
# Health check
curl -s https://muskepo.com/health
```
---
*Document end*