dealspace/SPEC.md

496 lines
18 KiB
Markdown

# Dealspace — Architecture Specification
**Version:** 0.1 — 2026-02-28
**Status:** Pre-implementation. This document is the ground truth. Code follows the spec, never the reverse.
---
## 1. What This Is
A workflow platform for M&A deal management. Investment Banks, Sellers, and Buyers collaborate on a structured request-and-answer system. The core primitive is a **Request** — not a document, not a folder. Documents are how requests get resolved.
**Not** a VDR that grew features. Designed clean, from first principles.
---
## 2. What This Is Not
- Not a document repository with a request list bolted on
- Not a project management tool with deal branding
- Not a clone of any existing product
- Not feature-complete on day one — the spec defines the architecture; MVP scope is separate
---
## 3. The Flow
```
IB creates Project
→ configures Workstreams (Finance, Legal, IT, HR, Operations...)
→ invites Participants (assigns role per workstream)
→ issues Request List to Seller
Seller receives Requests
→ assigns internally
→ uploads Answers (documents, data)
→ marks complete
IB vets Answers
→ approves → Answer published to Data Room
→ rejects → back to Seller with comment
Buyers enter
→ submit Requests (via Data Room interface)
→ AI matches against existing Answers (human confirms)
→ unmatched → routed to IB/Seller for resolution
→ Answer published → broadcast to all Buyers who asked equivalent question
```
---
## 4. Core Data Model — Entry-Based
Inspired directly by inou's entry architecture. **One table to rule them all.**
### 4.1 The Entry
```go
type Entry struct {
EntryID string // UUID, plain (never encrypted)
ProjectID string // UUID, plain
ParentID string // UUID, plain (empty = project root)
Type string // structural kind (see 4.2)
Depth int // 0=project, 1=workstream, 2=list, 3=request/answer
SearchKey string // packed: primary lookup (workstream slug, request ref#)
SearchKey2 string // packed: secondary lookup (requester org, answer hash)
Summary string // packed: structural/navigational ONLY — no content
Data []byte // packed: all content, metadata, routing, status
Stage string // plain: "pre_dataroom" | "dataroom" | "closed"
CreatedAt int64 // unix ms, plain
UpdatedAt int64 // unix ms, plain
CreatedBy string // user ID, plain
}
```
**Rule:** Summary is navigational only. Never put content in Summary. LLMs and MCP tools read Data.
### 4.2 Entry Types (Type field)
| Type | Depth | Description |
|----------------|-------|------------------------------------------|
| `project` | 0 | Top-level container |
| `workstream` | 1 | RBAC anchor (Finance, Legal, IT...) |
| `request_list` | 2 | Named collection of requests |
| `request` | 3 | A single request item |
| `answer` | 3 | A response to one or more requests |
| `comment` | * | Threaded comment on any entry |
### 4.3 Answer → Request Links (many-to-many)
One answer can satisfy N requests. When an answer is published, all linked requests are notified — broadcast to all requesting parties who have access.
```sql
CREATE TABLE answer_links (
answer_id TEXT NOT NULL REFERENCES entries(entry_id),
request_id TEXT NOT NULL REFERENCES entries(entry_id),
linked_by TEXT NOT NULL,
linked_at INTEGER NOT NULL,
confirmed INTEGER NOT NULL DEFAULT 0, -- 1 = human confirmed
ai_score REAL,
PRIMARY KEY (answer_id, request_id)
);
```
### 4.4 Entry Data (JSON inside packed blob)
For `request`:
```json
{
"title": "Provide audited financials FY2024",
"body": "...",
"priority": "high|normal|low",
"due_date": "2026-03-15",
"assigned_to": ["user_id"],
"status": "open|assigned|answered|vetted|published|closed",
"ref": "FIN-042"
}
```
For `answer`:
```json
{
"title": "...",
"body": "...",
"files": ["object_id"],
"status": "draft|submitted|approved|rejected|published",
"rejection_reason": "...",
"broadcast_to": "linked_requesters|all_workstream|all_dataroom"
}
```
---
## 5. RBAC
### 5.1 Roles
| Role | Scope | Permissions |
|-----------------|----------------|----------------------------------------------------------|
| `ib_admin` | Project | Full control, all workstreams |
| `ib_member` | Workstream(s) | Manage requests + vet answers in assigned workstreams |
| `seller_admin` | Project | See all requests directed at seller, manage seller team |
| `seller_member` | Workstream(s) | Answer requests in assigned workstreams |
| `buyer_admin` | Project | Manage buyer team, see data room |
| `buyer_member` | Workstream(s) | Submit requests, view published data room answers |
| `observer` | Workstream(s) | Read-only, no submission |
### 5.2 Access Table
```sql
CREATE TABLE access (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
workstream_id TEXT, -- null = all workstreams in this project
user_id TEXT NOT NULL,
role TEXT NOT NULL,
ops TEXT NOT NULL, -- "r", "rw", "rwdm"
granted_by TEXT NOT NULL,
granted_at INTEGER NOT NULL
);
```
**RBAC anchor:** The `workstream` entry (depth 1) is the access root. Every operation walks up to the workstream to resolve permissions.
### 5.3 The Single Throat
Three choke points. No exceptions. Not even "just this once."
```
All Reads → EntryRead(actorID, projectID, filter) → CheckAccess → query
All Writes → EntryWrite(actorID, entries...) → CheckAccess → save
All Deletes → EntryDelete(actorID, projectID, filter) → CheckAccess → delete
Object I/O → ObjectRead/ObjectWrite/ObjectDelete → CheckAccess → storage
```
No handler ever touches the DB directly. No raw SQL outside lib/dbcore.go.
### 5.4 Data Room Visibility
`Stage = "pre_dataroom"` entries are invisible to buyer roles — not filtered in the UI, invisible at the DB layer via CheckAccess. Buyers cannot see a question exists, let alone its answer, until the IB publishes it to the data room.
---
## 6. Security
### 6.1 Encryption & Compression
All string content fields use Pack / Unpack:
```
Pack: raw string → zstd compress → AES-256-GCM encrypt → []byte
Unpack: []byte → decrypt → decompress → string
```
- SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups
- Summary, Data: Non-deterministic (random nonce) — never queried directly
- IDs, integers, Stage: Plain text — structural, never sensitive
- Files: ObjectWrite encrypts before storage; ObjectRead decrypts on serve
Per-project encryption key derived from master key + project_id. One project's key compromise does not affect others.
### 6.2 File Protection Pipeline
Files are never served raw. Every file goes through the protection pipeline at serve time. The stored file is always the clean original.
| Type | Protection |
|---------------|------------------------------------------------------------------|
| PDF | Dynamic watermark (user + timestamp + org) rendered per-request |
| Word (.docx) | Watermark injected into document XML before serve |
| Excel (.xlsx) | Sheet protection + watermark header row injected before serve |
| Images | Watermark text burned into pixel data per-request |
| Video | Watermark overlay via ffmpeg, served as stream |
| Other | Encrypted download only, no preview |
Watermark content (configurable per project):
```
{user_name} · {org_name} · {datetime} · CONFIDENTIAL
```
Watermarks are generated at serve time. Parameters are project-level config, not hardcoded.
### 6.3 Storage Pricing (Competitive Advantage)
Files stored compressed + encrypted. No per-MB extortion. Competitors charge up to $20/MB for "secure storage." We store at actual cost. This is a direct and easy competitive win for Misha.
### 6.4 Audit Log
Every access grant change, file download, status transition — logged.
```sql
CREATE TABLE audit (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
actor_id TEXT NOT NULL,
action TEXT NOT NULL, -- packed
target_id TEXT,
details TEXT, -- packed
ip TEXT,
ts INTEGER NOT NULL
);
```
---
## 7. Object Store
```go
type ObjectStore interface {
Write(id string, data []byte) error
Read(id string) ([]byte, error)
Delete(id string) error
Exists(id string) bool
}
```
Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup.
Implementations: local filesystem (default), S3-compatible (plug-in). App code never knows the difference.
---
## 8. AI Matching Pipeline
When a buyer submits a request:
1. Embed request text (Fireworks nomic-embed-text-v1.5 — zero retention)
2. Cosine similarity vs all published answers in same workstream
3. Score ≥ 0.72 → suggest match, require human confirmation
4. Score < 0.72 route to IB/Seller for manual response
5. Human confirms answer_links row `confirmed=1`, broadcast fires
Private data never leaves Fireworks (zero retention policy). Same infra as inou.
---
## 9. Themes
Theme = CSS custom properties bundle. Zero hardcoded colors in templates. Every color references a CSS var.
```go
type Theme struct {
ID string
Name string
ProjectID string // null = system theme
Properties string // packed — CSS vars as JSON
}
```
Built-in: Light, Dark, High-contrast. Projects can define a custom theme (brand colors, logo). Users can override with personal preference. Theme switching = swap one class on `<html>`. No JavaScript framework required.
---
## 10. MCP Support
MCP server exposes deal context to AI tools. Follows inou's MCP pattern:
- All tools operate within `(actor, project)` context full RBAC enforced
- Read tools: list requests, query answers, check status, get workstream summary
- Write tools: AI-suggested routing (human confirmation required before any state change)
- Gating: AI cannot read pre-dataroom content without explicit unlock (mirrors inou's tier-1/tier-2 pattern)
Detailed MCP spec written separately after core schema is stable.
---
## 11. Go Implementation Rules
Non-negotiable. Violations require explicit discussion.
### 11.1 Package Structure
```
dealspace/
cmd/server/ main entry point, config loading
lib/
types.go All shared types — Entry, User, Project, Theme, etc.
dbcore.go EntryRead, EntryWrite, EntryDelete — the three choke points
rbac.go CheckAccess, permission resolution, role definitions
crypto.go Pack, Unpack, ObjectEncrypt, ObjectDecrypt
store.go ObjectStore interface + implementations
watermark.go Per-type watermark injection (PDF, DOCX, XLSX, image, video)
embed.go AI embedding client + cosine similarity
notify.go Broadcast logic — answer published → notify requesters
api/
middleware.go Auth, logging, rate limiting, CORS
handlers.go Thin handlers only — extract input, call lib, return response
routes.go Route registration
portal/
templates/ HTML templates (no hardcoded colors)
static/ CSS (theme vars), JS (minimal)
mcp/
server.go MCP tool registration and dispatch
```
### 11.2 Handler Rules
- Handlers: extract input call lib return response. Nothing else.
- No SQL in handlers. Ever.
- No business logic in handlers. Ever.
- If two handlers share logic extract to lib.
- Error responses: one helper function, used everywhere. `{"error": "...", "code": "..."}`
### 11.3 DB Access Rules
- No `db.Query` / `db.Exec` outside `lib/dbcore.go`
- No raw SQL in any file outside `lib/dbcore.go`
- Entry access: `EntryRead`, `EntryWrite`, `EntryDelete` only
- Object access: `ObjectRead`, `ObjectWrite`, `ObjectDelete` only
- User/project/access operations: dedicated functions in dbcore, never inline SQL
### 11.4 Naming Conventions
- RBAC-enforced functions: exported, full name (`EntryRead`, `EntryWrite`)
- System-only bypass: unexported, explicit suffix (`entryReadSystem`)
- The distinction must be obvious from the name alone
---
## 12. UI Philosophy
- **Project = select box at the top.** One line. You pick your project and you're in it.
- No project browser consuming 20% of screen real estate.
- Workstream tabs within a project: Finance | Legal | IT | HR | Operations
- Information hierarchy: Workstream Request List Request Answer
- Status visible without clicking in
- Competitor trap to avoid: adding features without removing complexity. Every new feature must justify its screen cost.
---
## 13. Out of Scope for MVP
- Email notifications
- Mobile app
- Third-party integrations (DocuSign, Salesforce)
- Public API
- Per-firm white-labeling
---
## 14. Retired Code
Previous attempt archived at `/home/johan/dev/dealroom-retired-20260228/`
**Carried forward:**
- AI matching concept (embeddings + cosine similarity at 0.72 threshold)
- Broadcast answer semantics
- Color palette
**Everything else starts fresh.**
---
## 15. Schema Change Checklist
When modifying the data model:
1. Update this SPEC.md first
2. Update `lib/types.go`
3. Update `lib/dbcore.go`
4. Update `lib/rbac.go` if access model changes
5. Update migration files
6. Update MCP tools if query patterns changed
7. No exceptions to this order
---
*This document is the ground truth. If code disagrees with the spec, the code is wrong.*
---
## 16. Workflow & Task Model (added 2026-02-28)
### 16.1 The Core Insight
Most users are **workers, not deal managers**. When the accountant logs in they see their task inbox not a deal room, not workstream dashboards, not buyer activity. Just: what do I need to do today.
The big picture (deal progress, buyer activity, request completion %) is the IB admin's view. Role determines UI surface entirely. Same platform, completely different experience.
### 16.2 The Routing Chain
Tasks don't just get assigned they have a return path. Every forward creates an obligation to return.
```
Buyer → IB analyst → CFO → accountant
↓ (done)
Buyer ← IB analyst ← CFO ←──┘
```
Each hop knows where it came from and where it goes back when done. The IB analyst sees "waiting on CFO" the buyer sees nothing until the answer is published. Internal routing is invisible to external parties.
### 16.3 Entry Fields for Workflow (plain, indexed — never packed)
| Field | Purpose |
|----------------|---------------------------------------------------------------|
| `assignee_id` | Who has it RIGHT NOW powers the personal task inbox |
| `return_to_id` | Who it goes back to when done |
| `origin_id` | The ultimate requestor (buyer) who triggered the chain |
`routing_chain` packed in Data: full hop history with actors + timestamps, visible to IB admin only.
When the accountant marks done automatically lands in CFO's inbox.
When CFO approves automatically lands in IB analyst's inbox.
When IB analyst publishes buyer is notified.
No manual re-routing at each hop. The chain is set when the task is forwarded.
### 16.4 entry_events Table
The thread behind every entry. This IS the workflow history.
```sql
CREATE TABLE entry_events (
id TEXT PRIMARY KEY,
entry_id TEXT NOT NULL REFERENCES entries(entry_id),
actor_id TEXT NOT NULL,
channel TEXT NOT NULL, -- "web" | "email" | "slack" | "teams"
action TEXT NOT NULL, -- packed: "message"|"upload"|"forward"|"approve"|"reject"|"publish"
data TEXT NOT NULL, -- packed: message body, file refs, status transition details
ts INTEGER NOT NULL
);
CREATE INDEX idx_events_entry ON entry_events(entry_id);
CREATE INDEX idx_events_actor ON entry_events(actor_id);
```
Full thread visible to deal managers. Workers (accountant, CFO) see only their task queue `WHERE assignee_id = me`.
### 16.5 channel_threads Table
Maps external thread IDs to entries. Enables email/Slack/Teams participation without login.
```sql
CREATE TABLE channel_threads (
id TEXT PRIMARY KEY,
entry_id TEXT NOT NULL REFERENCES entries(entry_id),
channel TEXT NOT NULL, -- "email" | "slack" | "teams"
thread_id TEXT NOT NULL, -- Message-ID, thread_ts, conversationId
project_id TEXT NOT NULL,
UNIQUE(channel, thread_id)
);
```
Inbound message on a known thread_id creates entry_event on the mapped entry triggers next hop in routing chain.
### 16.6 Final Table List
```
entries — the tree + workflow state (assignee_id, return_to_id, origin_id plain+indexed)
entry_events — the thread / workflow history per entry
channel_threads — external channel routing (email/Slack/Teams → entry)
answer_links — answer ↔ request, ai_score, confirmed, rejected, vetting
users — accounts + auth
access — RBAC: (user, project, workstream, role, ops)
embeddings — (entry_id, vector BLOB) for AI matching
audit — security events: grants, downloads, logins, key transitions
```
Eight tables. No objects table files referenced by 16-char hex ObjectID stored in entry Data.