commit 08996a1396b566d6d88836fb79b849f6d8d82996 Author: James Date: Sat Feb 28 03:21:44 2026 -0500 Initial spec diff --git a/SPEC.md b/SPEC.md new file mode 100644 index 0000000..dd8e229 --- /dev/null +++ b/SPEC.md @@ -0,0 +1,495 @@ +# Dealspace — Architecture Specification + +**Version:** 0.1 — 2026-02-28 +**Status:** Pre-implementation. This document is the ground truth. Code follows the spec, never the reverse. + +--- + +## 1. What This Is + +A workflow platform for M&A deal management. Investment Banks, Sellers, and Buyers collaborate on a structured request-and-answer system. The core primitive is a **Request** — not a document, not a folder. Documents are how requests get resolved. + +**Not** a VDR that grew features. Designed clean, from first principles. + +--- + +## 2. What This Is Not + +- Not a document repository with a request list bolted on +- Not a project management tool with deal branding +- Not a clone of any existing product +- Not feature-complete on day one — the spec defines the architecture; MVP scope is separate + +--- + +## 3. The Flow + +``` +IB creates Project + → configures Workstreams (Finance, Legal, IT, HR, Operations...) + → invites Participants (assigns role per workstream) + → issues Request List to Seller + +Seller receives Requests + → assigns internally + → uploads Answers (documents, data) + → marks complete + +IB vets Answers + → approves → Answer published to Data Room + → rejects → back to Seller with comment + +Buyers enter + → submit Requests (via Data Room interface) + → AI matches against existing Answers (human confirms) + → unmatched → routed to IB/Seller for resolution + → Answer published → broadcast to all Buyers who asked equivalent question +``` + +--- + +## 4. Core Data Model — Entry-Based + +Inspired directly by inou's entry architecture. **One table to rule them all.** + +### 4.1 The Entry + +```go +type Entry struct { + EntryID string // UUID, plain (never encrypted) + ProjectID string // UUID, plain + ParentID string // UUID, plain (empty = project root) + Type string // structural kind (see 4.2) + Depth int // 0=project, 1=workstream, 2=list, 3=request/answer + SearchKey string // packed: primary lookup (workstream slug, request ref#) + SearchKey2 string // packed: secondary lookup (requester org, answer hash) + Summary string // packed: structural/navigational ONLY — no content + Data []byte // packed: all content, metadata, routing, status + Stage string // plain: "pre_dataroom" | "dataroom" | "closed" + CreatedAt int64 // unix ms, plain + UpdatedAt int64 // unix ms, plain + CreatedBy string // user ID, plain +} +``` + +**Rule:** Summary is navigational only. Never put content in Summary. LLMs and MCP tools read Data. + +### 4.2 Entry Types (Type field) + +| Type | Depth | Description | +|----------------|-------|------------------------------------------| +| `project` | 0 | Top-level container | +| `workstream` | 1 | RBAC anchor (Finance, Legal, IT...) | +| `request_list` | 2 | Named collection of requests | +| `request` | 3 | A single request item | +| `answer` | 3 | A response to one or more requests | +| `comment` | * | Threaded comment on any entry | + +### 4.3 Answer → Request Links (many-to-many) + +One answer can satisfy N requests. When an answer is published, all linked requests are notified — broadcast to all requesting parties who have access. + +```sql +CREATE TABLE answer_links ( + answer_id TEXT NOT NULL REFERENCES entries(entry_id), + request_id TEXT NOT NULL REFERENCES entries(entry_id), + linked_by TEXT NOT NULL, + linked_at INTEGER NOT NULL, + confirmed INTEGER NOT NULL DEFAULT 0, -- 1 = human confirmed + ai_score REAL, + PRIMARY KEY (answer_id, request_id) +); +``` + +### 4.4 Entry Data (JSON inside packed blob) + +For `request`: +```json +{ + "title": "Provide audited financials FY2024", + "body": "...", + "priority": "high|normal|low", + "due_date": "2026-03-15", + "assigned_to": ["user_id"], + "status": "open|assigned|answered|vetted|published|closed", + "ref": "FIN-042" +} +``` + +For `answer`: +```json +{ + "title": "...", + "body": "...", + "files": ["object_id"], + "status": "draft|submitted|approved|rejected|published", + "rejection_reason": "...", + "broadcast_to": "linked_requesters|all_workstream|all_dataroom" +} +``` + +--- + +## 5. RBAC + +### 5.1 Roles + +| Role | Scope | Permissions | +|-----------------|----------------|----------------------------------------------------------| +| `ib_admin` | Project | Full control, all workstreams | +| `ib_member` | Workstream(s) | Manage requests + vet answers in assigned workstreams | +| `seller_admin` | Project | See all requests directed at seller, manage seller team | +| `seller_member` | Workstream(s) | Answer requests in assigned workstreams | +| `buyer_admin` | Project | Manage buyer team, see data room | +| `buyer_member` | Workstream(s) | Submit requests, view published data room answers | +| `observer` | Workstream(s) | Read-only, no submission | + +### 5.2 Access Table + +```sql +CREATE TABLE access ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + workstream_id TEXT, -- null = all workstreams in this project + user_id TEXT NOT NULL, + role TEXT NOT NULL, + ops TEXT NOT NULL, -- "r", "rw", "rwdm" + granted_by TEXT NOT NULL, + granted_at INTEGER NOT NULL +); +``` + +**RBAC anchor:** The `workstream` entry (depth 1) is the access root. Every operation walks up to the workstream to resolve permissions. + +### 5.3 The Single Throat + +Three choke points. No exceptions. Not even "just this once." + +``` +All Reads → EntryRead(actorID, projectID, filter) → CheckAccess → query +All Writes → EntryWrite(actorID, entries...) → CheckAccess → save +All Deletes → EntryDelete(actorID, projectID, filter) → CheckAccess → delete +Object I/O → ObjectRead/ObjectWrite/ObjectDelete → CheckAccess → storage +``` + +No handler ever touches the DB directly. No raw SQL outside lib/dbcore.go. + +### 5.4 Data Room Visibility + +`Stage = "pre_dataroom"` entries are invisible to buyer roles — not filtered in the UI, invisible at the DB layer via CheckAccess. Buyers cannot see a question exists, let alone its answer, until the IB publishes it to the data room. + +--- + +## 6. Security + +### 6.1 Encryption & Compression + +All string content fields use Pack / Unpack: + +``` +Pack: raw string → zstd compress → AES-256-GCM encrypt → []byte +Unpack: []byte → decrypt → decompress → string +``` + +- SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups +- Summary, Data: Non-deterministic (random nonce) — never queried directly +- IDs, integers, Stage: Plain text — structural, never sensitive +- Files: ObjectWrite encrypts before storage; ObjectRead decrypts on serve + +Per-project encryption key derived from master key + project_id. One project's key compromise does not affect others. + +### 6.2 File Protection Pipeline + +Files are never served raw. Every file goes through the protection pipeline at serve time. The stored file is always the clean original. + +| Type | Protection | +|---------------|------------------------------------------------------------------| +| PDF | Dynamic watermark (user + timestamp + org) rendered per-request | +| Word (.docx) | Watermark injected into document XML before serve | +| Excel (.xlsx) | Sheet protection + watermark header row injected before serve | +| Images | Watermark text burned into pixel data per-request | +| Video | Watermark overlay via ffmpeg, served as stream | +| Other | Encrypted download only, no preview | + +Watermark content (configurable per project): +``` +{user_name} · {org_name} · {datetime} · CONFIDENTIAL +``` + +Watermarks are generated at serve time. Parameters are project-level config, not hardcoded. + +### 6.3 Storage Pricing (Competitive Advantage) + +Files stored compressed + encrypted. No per-MB extortion. Competitors charge up to $20/MB for "secure storage." We store at actual cost. This is a direct and easy competitive win for Misha. + +### 6.4 Audit Log + +Every access grant change, file download, status transition — logged. + +```sql +CREATE TABLE audit ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + actor_id TEXT NOT NULL, + action TEXT NOT NULL, -- packed + target_id TEXT, + details TEXT, -- packed + ip TEXT, + ts INTEGER NOT NULL +); +``` + +--- + +## 7. Object Store + +```go +type ObjectStore interface { + Write(id string, data []byte) error + Read(id string) ([]byte, error) + Delete(id string) error + Exists(id string) bool +} +``` + +Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup. + +Implementations: local filesystem (default), S3-compatible (plug-in). App code never knows the difference. + +--- + +## 8. AI Matching Pipeline + +When a buyer submits a request: + +1. Embed request text (Fireworks nomic-embed-text-v1.5 — zero retention) +2. Cosine similarity vs all published answers in same workstream +3. Score ≥ 0.72 → suggest match, require human confirmation +4. Score < 0.72 → route to IB/Seller for manual response +5. Human confirms → answer_links row `confirmed=1`, broadcast fires + +Private data never leaves Fireworks (zero retention policy). Same infra as inou. + +--- + +## 9. Themes + +Theme = CSS custom properties bundle. Zero hardcoded colors in templates. Every color references a CSS var. + +```go +type Theme struct { + ID string + Name string + ProjectID string // null = system theme + Properties string // packed — CSS vars as JSON +} +``` + +Built-in: Light, Dark, High-contrast. Projects can define a custom theme (brand colors, logo). Users can override with personal preference. Theme switching = swap one class on ``. No JavaScript framework required. + +--- + +## 10. MCP Support + +MCP server exposes deal context to AI tools. Follows inou's MCP pattern: + +- All tools operate within `(actor, project)` context — full RBAC enforced +- Read tools: list requests, query answers, check status, get workstream summary +- Write tools: AI-suggested routing (human confirmation required before any state change) +- Gating: AI cannot read pre-dataroom content without explicit unlock (mirrors inou's tier-1/tier-2 pattern) + +Detailed MCP spec written separately after core schema is stable. + +--- + +## 11. Go Implementation Rules + +Non-negotiable. Violations require explicit discussion. + +### 11.1 Package Structure + +``` +dealspace/ + cmd/server/ main entry point, config loading + lib/ + types.go All shared types — Entry, User, Project, Theme, etc. + dbcore.go EntryRead, EntryWrite, EntryDelete — the three choke points + rbac.go CheckAccess, permission resolution, role definitions + crypto.go Pack, Unpack, ObjectEncrypt, ObjectDecrypt + store.go ObjectStore interface + implementations + watermark.go Per-type watermark injection (PDF, DOCX, XLSX, image, video) + embed.go AI embedding client + cosine similarity + notify.go Broadcast logic — answer published → notify requesters + api/ + middleware.go Auth, logging, rate limiting, CORS + handlers.go Thin handlers only — extract input, call lib, return response + routes.go Route registration + portal/ + templates/ HTML templates (no hardcoded colors) + static/ CSS (theme vars), JS (minimal) + mcp/ + server.go MCP tool registration and dispatch +``` + +### 11.2 Handler Rules + +- Handlers: extract input → call lib → return response. Nothing else. +- No SQL in handlers. Ever. +- No business logic in handlers. Ever. +- If two handlers share logic → extract to lib. +- Error responses: one helper function, used everywhere. `{"error": "...", "code": "..."}` + +### 11.3 DB Access Rules + +- No `db.Query` / `db.Exec` outside `lib/dbcore.go` +- No raw SQL in any file outside `lib/dbcore.go` +- Entry access: `EntryRead`, `EntryWrite`, `EntryDelete` only +- Object access: `ObjectRead`, `ObjectWrite`, `ObjectDelete` only +- User/project/access operations: dedicated functions in dbcore, never inline SQL + +### 11.4 Naming Conventions + +- RBAC-enforced functions: exported, full name (`EntryRead`, `EntryWrite`) +- System-only bypass: unexported, explicit suffix (`entryReadSystem`) +- The distinction must be obvious from the name alone + +--- + +## 12. UI Philosophy + +- **Project = select box at the top.** One line. You pick your project and you're in it. +- No project browser consuming 20% of screen real estate. +- Workstream tabs within a project: Finance | Legal | IT | HR | Operations +- Information hierarchy: Workstream → Request List → Request → Answer +- Status visible without clicking in +- Competitor trap to avoid: adding features without removing complexity. Every new feature must justify its screen cost. + +--- + +## 13. Out of Scope for MVP + +- Email notifications +- Mobile app +- Third-party integrations (DocuSign, Salesforce) +- Public API +- Per-firm white-labeling + +--- + +## 14. Retired Code + +Previous attempt archived at `/home/johan/dev/dealroom-retired-20260228/` + +**Carried forward:** +- AI matching concept (embeddings + cosine similarity at 0.72 threshold) +- Broadcast answer semantics +- Color palette + +**Everything else starts fresh.** + +--- + +## 15. Schema Change Checklist + +When modifying the data model: +1. Update this SPEC.md first +2. Update `lib/types.go` +3. Update `lib/dbcore.go` +4. Update `lib/rbac.go` if access model changes +5. Update migration files +6. Update MCP tools if query patterns changed +7. No exceptions to this order + +--- + +*This document is the ground truth. If code disagrees with the spec, the code is wrong.* + +--- + +## 16. Workflow & Task Model (added 2026-02-28) + +### 16.1 The Core Insight + +Most users are **workers, not deal managers**. When the accountant logs in they see their task inbox — not a deal room, not workstream dashboards, not buyer activity. Just: what do I need to do today. + +The big picture (deal progress, buyer activity, request completion %) is the IB admin's view. Role determines UI surface entirely. Same platform, completely different experience. + +### 16.2 The Routing Chain + +Tasks don't just get assigned — they have a return path. Every forward creates an obligation to return. + +``` +Buyer → IB analyst → CFO → accountant + ↓ (done) +Buyer ← IB analyst ← CFO ←──┘ +``` + +Each hop knows where it came from and where it goes back when done. The IB analyst sees "waiting on CFO" — the buyer sees nothing until the answer is published. Internal routing is invisible to external parties. + +### 16.3 Entry Fields for Workflow (plain, indexed — never packed) + +| Field | Purpose | +|----------------|---------------------------------------------------------------| +| `assignee_id` | Who has it RIGHT NOW — powers the personal task inbox | +| `return_to_id` | Who it goes back to when done | +| `origin_id` | The ultimate requestor (buyer) who triggered the chain | + +`routing_chain` — packed in Data: full hop history with actors + timestamps, visible to IB admin only. + +When the accountant marks done → automatically lands in CFO's inbox. +When CFO approves → automatically lands in IB analyst's inbox. +When IB analyst publishes → buyer is notified. + +No manual re-routing at each hop. The chain is set when the task is forwarded. + +### 16.4 entry_events Table + +The thread behind every entry. This IS the workflow history. + +```sql +CREATE TABLE entry_events ( + id TEXT PRIMARY KEY, + entry_id TEXT NOT NULL REFERENCES entries(entry_id), + actor_id TEXT NOT NULL, + channel TEXT NOT NULL, -- "web" | "email" | "slack" | "teams" + action TEXT NOT NULL, -- packed: "message"|"upload"|"forward"|"approve"|"reject"|"publish" + data TEXT NOT NULL, -- packed: message body, file refs, status transition details + ts INTEGER NOT NULL +); +CREATE INDEX idx_events_entry ON entry_events(entry_id); +CREATE INDEX idx_events_actor ON entry_events(actor_id); +``` + +Full thread visible to deal managers. Workers (accountant, CFO) see only their task queue — `WHERE assignee_id = me`. + +### 16.5 channel_threads Table + +Maps external thread IDs to entries. Enables email/Slack/Teams participation without login. + +```sql +CREATE TABLE channel_threads ( + id TEXT PRIMARY KEY, + entry_id TEXT NOT NULL REFERENCES entries(entry_id), + channel TEXT NOT NULL, -- "email" | "slack" | "teams" + thread_id TEXT NOT NULL, -- Message-ID, thread_ts, conversationId + project_id TEXT NOT NULL, + UNIQUE(channel, thread_id) +); +``` + +Inbound message on a known thread_id → creates entry_event on the mapped entry → triggers next hop in routing chain. + +### 16.6 Final Table List + +``` +entries — the tree + workflow state (assignee_id, return_to_id, origin_id plain+indexed) +entry_events — the thread / workflow history per entry +channel_threads — external channel routing (email/Slack/Teams → entry) +answer_links — answer ↔ request, ai_score, confirmed, rejected, vetting +users — accounts + auth +access — RBAC: (user, project, workstream, role, ops) +embeddings — (entry_id, vector BLOB) for AI matching +audit — security events: grants, downloads, logins, key transitions +``` + +Eight tables. No objects table — files referenced by 16-char hex ObjectID stored in entry Data.