dealroom/SPEC.md

547 lines
17 KiB
Markdown

# Deal Room - Architecture Specification
**Project:** Deal Room - Secure Investment Banking Document Sharing Platform
**Owner:** Misha Muskepo (michael@muskepo.com)
**Tech Lead:** James
**Architecture Pattern:** inou-portal pattern
## Executive Summary
Deal Room is a secure, invite-only document sharing platform designed for Investment Banking deal teams. It provides role-based access control, encrypted file storage, AI-powered document analysis, and comprehensive audit trails for sensitive financial transactions.
## System Architecture
### Core Principles
- **Single binary deployment** - Zero runtime dependencies
- **Data-centric design** - All entities stored as typed JSON in unified entries table
- **Security-first** - Encryption at rest, RBAC, audit logging
- **AI-enhanced** - Document analysis and embeddings for intelligent search
- **Production-grade** - Battle-tested patterns from inou-portal
### Technology Stack
- **Backend:** Go 1.22+ (single binary)
- **Templates:** templ (type-safe HTML generation)
- **Frontend:** HTMX + Tailwind CSS (CDN)
- **Database:** SQLite with encryption at rest
- **File Storage:** Encrypted (AES-256-GCM) + Compressed (zstd)
- **AI/ML:** K2.5 for document analysis and embeddings
- **Authentication:** Magic link + session cookies
## Database Schema
### Core Tables
#### users
```sql
CREATE TABLE users (
id TEXT PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
role TEXT NOT NULL CHECK (role IN ('admin', 'user')),
avatar_url TEXT,
created_at INTEGER NOT NULL,
last_login INTEGER,
is_active BOOLEAN NOT NULL DEFAULT 1
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_active ON users(is_active);
```
#### entries
**Unified data table storing all content types as typed JSON**
```sql
CREATE TABLE entries (
id TEXT PRIMARY KEY,
parent_id TEXT, -- For threading/hierarchy
deal_room_id TEXT NOT NULL, -- Links to deal room entry
entry_type TEXT NOT NULL CHECK (entry_type IN ('deal_room', 'document', 'note', 'message', 'analysis')),
title TEXT NOT NULL,
content TEXT NOT NULL, -- JSON payload, schema varies by type
file_path TEXT, -- For documents: encrypted file path
file_size INTEGER, -- Original file size
file_hash TEXT, -- SHA-256 of original file
embedding BLOB, -- AI embeddings for search
created_by TEXT NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
FOREIGN KEY (created_by) REFERENCES users(id),
FOREIGN KEY (parent_id) REFERENCES entries(id),
FOREIGN KEY (deal_room_id) REFERENCES entries(id)
);
CREATE INDEX idx_entries_deal_room ON entries(deal_room_id);
CREATE INDEX idx_entries_type ON entries(entry_type);
CREATE INDEX idx_entries_parent ON entries(parent_id);
CREATE INDEX idx_entries_created ON entries(created_at);
CREATE INDEX idx_entries_creator ON entries(created_by);
```
#### access
**RBAC permissions using bitmask**
```sql
CREATE TABLE access (
id TEXT PRIMARY KEY,
entry_id TEXT NOT NULL,
user_id TEXT NOT NULL,
permissions INTEGER NOT NULL DEFAULT 1, -- Bitmask: read=1, write=2, delete=4, manage=8
granted_by TEXT NOT NULL,
granted_at INTEGER NOT NULL,
FOREIGN KEY (entry_id) REFERENCES entries(id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
FOREIGN KEY (granted_by) REFERENCES users(id),
UNIQUE(entry_id, user_id)
);
CREATE INDEX idx_access_entry ON access(entry_id);
CREATE INDEX idx_access_user ON access(user_id);
CREATE INDEX idx_access_permissions ON access(permissions);
```
#### sessions
```sql
CREATE TABLE sessions (
token TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
expires_at INTEGER NOT NULL,
created_at INTEGER NOT NULL,
last_used INTEGER NOT NULL,
user_agent TEXT,
ip_address TEXT,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_sessions_user ON sessions(user_id);
CREATE INDEX idx_sessions_expires ON sessions(expires_at);
```
#### audit_log
```sql
CREATE TABLE audit_log (
id TEXT PRIMARY KEY,
user_id TEXT,
entry_id TEXT,
action TEXT NOT NULL, -- view, create, update, delete, download, share
details TEXT, -- JSON with action-specific data
ip_address TEXT,
user_agent TEXT,
created_at INTEGER NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (entry_id) REFERENCES entries(id)
);
CREATE INDEX idx_audit_user ON audit_log(user_id);
CREATE INDEX idx_audit_entry ON audit_log(entry_id);
CREATE INDEX idx_audit_action ON audit_log(action);
CREATE INDEX idx_audit_created ON audit_log(created_at);
```
### Entry Content Schemas
#### Deal Room Entry
```json
{
"type": "deal_room",
"description": "Acquisition of TechCorp by PrivateEquity Partners",
"stage": "due_diligence", // sourcing, loi, due_diligence, closing, completed
"target_company": "TechCorp Inc.",
"deal_value": "$50M",
"participants": [
{"name": "John Smith", "role": "Deal Lead", "organization": "PE Partners"},
{"name": "Jane Doe", "role": "Analyst", "organization": "PE Partners"}
],
"key_dates": {
"loi_signed": "2024-01-15",
"dd_start": "2024-02-01",
"close_target": "2024-04-30"
},
"confidentiality_level": "highly_confidential"
}
```
#### Document Entry
```json
{
"type": "document",
"category": "financial_model", // nda, cim, financial_model, teaser, legal, dd_report
"mime_type": "application/pdf",
"original_name": "TechCorp_Financial_Model_v3.xlsx",
"version": "3.0",
"analysis": {
"summary": "Financial projections showing 15% EBITDA growth",
"key_metrics": ["Revenue: $100M", "EBITDA: $25M", "Growth: 15%"],
"risk_factors": ["Market volatility", "Regulatory changes"],
"ai_confidence": 0.92
},
"requires_nda": true
}
```
#### Note/Message Entry
```json
{
"type": "note",
"body": "Updated financial model reflects Q4 performance...",
"mentions": ["user_id_123"], // @mentions for notifications
"attachments": ["entry_id_456"], // Reference to document entries
"thread_context": "document_discussion" // Helps organize conversations
}
```
## Permission Model (RBAC)
### Bitmask Values
- **READ (1):** View entry and metadata
- **WRITE (2):** Edit entry, add comments
- **DELETE (4):** Remove entry
- **MANAGE (8):** Grant/revoke access, manage permissions
### Permission Inheritance
- Deal room permissions cascade to all contained documents
- Explicit document permissions override inherited permissions
- Admins have MANAGE (8) on all entries by default
### Common Permission Patterns
- **Viewer:** READ (1) - Can view documents and notes
- **Contributor:** READ + WRITE (3) - Can add notes and upload documents
- **Manager:** READ + WRITE + DELETE (7) - Can manage content
- **Admin:** All permissions (15) - Full control
## API Endpoints
### Authentication
```
POST /auth/login # Magic link login
GET /auth/verify/{token} # Verify magic link
POST /auth/logout # End session
GET /auth/me # Current user info
```
### Deal Rooms
```
GET /api/deal-rooms # List accessible deal rooms
POST /api/deal-rooms # Create new deal room
GET /api/deal-rooms/{id} # Get deal room details
PUT /api/deal-rooms/{id} # Update deal room
DELETE /api/deal-rooms/{id} # Archive deal room
```
### Entries (Documents, Notes, etc.)
```
GET /api/entries # List entries (filtered by permissions)
POST /api/entries # Create entry
GET /api/entries/{id} # Get entry details
PUT /api/entries/{id} # Update entry
DELETE /api/entries/{id} # Delete entry
GET /api/entries/{id}/file # Download file (for documents)
```
### Access Management
```
GET /api/entries/{id}/access # List permissions for entry
POST /api/entries/{id}/access # Grant access
PUT /api/entries/{id}/access/{uid} # Update user permissions
DELETE /api/entries/{id}/access/{uid} # Revoke access
```
### Search & AI
```
GET /api/search?q={query} # Semantic search across accessible content
POST /api/analyze/{entry_id} # Trigger AI analysis
GET /api/similar/{entry_id} # Find similar documents
```
### Activity & Audit
```
GET /api/activity/{deal_room_id} # Activity feed for deal room
GET /api/audit/{entry_id} # Audit log for specific entry
```
## Page Routes (Server-Rendered)
```
GET / # Dashboard - accessible deal rooms
GET /login # Login page
GET /deal-rooms/{id} # Deal room detail view
GET /deal-rooms/{id}/upload # Document upload page
GET /documents/{id} # Document viewer
GET /admin # Admin panel (user/permissions management)
GET /profile # User profile
GET /activity # Global activity feed
```
## File Storage Design
### Storage Structure
```
data/
├── db/
│ └── dealroom.db # SQLite database
├── files/
│ ├── 2024/01/ # Date-based partitioning
│ │ ├── abc123.enc # Encrypted + compressed files
│ │ └── def456.enc
│ └── temp/ # Temporary upload staging
└── backups/ # Automated backups
├── db/
└── files/
```
### Encryption Process
1. **Upload:** File uploaded to `/temp/{uuid}`
2. **Compress:** Apply zstd compression (level 3)
3. **Encrypt:** AES-256-GCM with random nonce
4. **Store:** Move to date-partitioned directory
5. **Index:** Store metadata + embedding in database
6. **Cleanup:** Remove temp file
### File Naming
- **Pattern:** `{year}/{month}/{entry_id}.enc`
- **Metadata:** Stored in database, not filesystem
- **Deduplication:** SHA-256 hash prevents duplicate storage
## AI/Embeddings Pipeline
### Document Processing Workflow
1. **Upload:** User uploads document
2. **Extract:** Convert to text (PDF, DOCX, XLSX support)
3. **Analyze:** Send to K2.5 for:
- Content summarization
- Key metrics extraction
- Risk factor identification
- Classification (NDA, CIM, Financial Model, etc.)
4. **Embed:** Generate vector embeddings for semantic search
5. **Store:** Save analysis results in entry content JSON
### K2.5 Integration
```go
type DocumentAnalysis struct {
Summary string `json:"summary"`
KeyMetrics []string `json:"key_metrics"`
RiskFactors []string `json:"risk_factors"`
Category string `json:"category"`
Confidence float64 `json:"ai_confidence"`
}
type EmbeddingRequest struct {
Text string `json:"text"`
Model string `json:"model"`
}
```
### Semantic Search
- **Vector Storage:** SQLite with vector extension
- **Similarity:** Cosine similarity for document matching
- **Hybrid Search:** Combine keyword + semantic results
- **Access Control:** Filter results by user permissions
## Security Model
### Authentication
- **Magic Link:** Email-based passwordless login
- **Session Management:** Secure HTTP-only cookies
- **Token Expiry:** 24-hour sessions with automatic refresh
- **Rate Limiting:** Prevent brute force attacks
### Authorization
- **RBAC:** Entry-level permissions with inheritance
- **Least Privilege:** Users see only what they have access to
- **Audit Trail:** All actions logged with user attribution
- **Admin Controls:** User management and permission oversight
### Data Protection
- **Encryption at Rest:** AES-256-GCM for files, encrypted SQLite
- **Encryption in Transit:** HTTPS only, HSTS headers
- **File Access:** Direct file serving prevented, all through API
- **Backup Encryption:** Automated encrypted backups
### Compliance Features
- **Audit Logging:** Comprehensive activity tracking
- **Data Retention:** Configurable retention policies
- **Access Reviews:** Periodic permission audits
- **Export Controls:** Document download tracking
## Deployment Architecture
### Single Binary Approach
```
dealroom
├── Static assets embedded (CSS, JS)
├── Templates compiled
├── Database migrations
└── Configuration via environment variables
```
### Configuration
```bash
# Database
DB_PATH=/data/db/dealroom.db
DB_KEY=<encryption_key>
# File Storage
FILES_PATH=/data/files
BACKUP_PATH=/data/backups
# AI Service
K25_API_URL=http://k2.5:8080
K25_API_KEY=<api_key>
# Server
PORT=8080
BASE_URL=https://dealroom.company.com
SESSION_SECRET=<random_key>
# Email (for magic links)
SMTP_HOST=smtp.company.com
SMTP_USER=dealroom@company.com
SMTP_PASS=<password>
```
### Docker Deployment
```dockerfile
FROM golang:1.22-alpine AS builder
# ... build process
FROM alpine:3.19
RUN apk --no-cache add ca-certificates tzdata
COPY --from=builder /app/dealroom /usr/local/bin/
VOLUME ["/data"]
EXPOSE 8080
CMD ["dealroom"]
```
## Development Workflow
### Project Structure
```
dealroom/
├── cmd/dealroom/main.go # Application entry point
├── internal/
│ ├── db/ # Database layer
│ ├── rbac/ # RBAC engine
│ ├── store/ # File storage
│ ├── ai/ # K2.5 integration
│ ├── handler/ # HTTP handlers
│ └── model/ # Data models
├── templates/ # templ templates
├── static/ # Static assets
├── migrations/ # Database migrations
├── Dockerfile
├── Makefile
└── README.md
```
### Build & Run
```bash
make build # Build binary
make run # Run in development mode
make test # Run tests
make migrate # Run database migrations
make docker # Build Docker image
```
### Testing Strategy
- **Unit Tests:** Core business logic and RBAC
- **Integration Tests:** Database and file operations
- **E2E Tests:** Critical user journeys with real browser
- **Security Tests:** Permission boundaries and file access
- **Performance Tests:** Large file uploads and search
## Scalability Considerations
### Current Limits (SQLite-based)
- **Concurrent Users:** ~100-200 active users
- **File Storage:** Limited by disk space
- **Search Performance:** Good up to ~10K documents
- **Database Size:** Efficient up to ~100GB
### Migration Path (Future)
- **Database:** PostgreSQL for higher concurrency
- **File Storage:** S3-compatible object storage
- **Search:** Dedicated vector database (Pinecone, Weaviate)
- **Caching:** Redis for session and query caching
## Success Metrics
### Technical Metrics
- **Uptime:** >99.9% availability
- **Response Time:** <200ms for page loads
- **File Upload:** <30s for 100MB files
- **Search Latency:** <500ms for semantic search
### Business Metrics
- **User Adoption:** Active users per deal room
- **Document Velocity:** Files uploaded/downloaded per day
- **Security Events:** Zero unauthorized access incidents
- **User Satisfaction:** NPS > 8 for ease of use
## Risk Assessment
### Technical Risks
- **Single Point of Failure:** SQLite limits high availability
- **File Corruption:** Encryption key loss = data loss
- **AI Dependency:** K2.5 service availability required
- **Scaling Challenges:** May need architecture changes at scale
### Mitigation Strategies
- **Automated Backups:** Hourly encrypted backups to S3
- **Key Management:** Secure key storage and rotation
- **Circuit Breakers:** Graceful degradation when AI unavailable
- **Monitoring:** Comprehensive health checks and alerting
### Security Risks
- **Data Breach:** Highly sensitive financial information
- **Insider Threat:** Authorized users with malicious intent
- **Compliance:** Regulatory requirements for financial data
- **Access Control:** Complex permission inheritance bugs
### Security Controls
- **Defense in Depth:** Multiple security layers
- **Principle of Least Privilege:** Minimal required permissions
- **Comprehensive Auditing:** All actions logged and monitored
- **Regular Reviews:** Periodic security assessments
## Implementation Phases
### Phase 1: Core Platform (4 weeks)
- Basic authentication and session management
- Deal room creation and management
- Document upload with encryption
- Basic RBAC implementation
### Phase 2: Collaboration Features (3 weeks)
- Notes and messaging system
- Activity feeds and notifications
- Advanced permission management
- Search functionality
### Phase 3: AI Integration (2 weeks)
- K2.5 document analysis
- Embeddings and semantic search
- Document summarization
- Similar document recommendations
### Phase 4: Production Readiness (2 weeks)
- Comprehensive audit logging
- Admin dashboard
- Performance optimization
- Security hardening
### Phase 5: Advanced Features (3 weeks)
- Deal stage tracking
- Bulk operations
- API for integrations
- Advanced reporting
Total estimated development time: **14 weeks** with dedicated development team.
## Conclusion
Deal Room represents a modern, security-first approach to Investment Banking document management. By leveraging the proven inou-portal pattern with Go's performance characteristics and AI-enhanced document analysis, we deliver a solution that meets the demanding requirements of financial services while maintaining operational simplicity through single-binary deployment.
The architecture prioritizes security, auditability, and user experience while providing a clear path for future scalability as the platform grows.