Initial spec

This commit is contained in:
James 2026-02-28 03:21:44 -05:00
commit 08996a1396
1 changed files with 495 additions and 0 deletions

495
SPEC.md Normal file
View File

@ -0,0 +1,495 @@
# Dealspace — Architecture Specification
**Version:** 0.1 — 2026-02-28
**Status:** Pre-implementation. This document is the ground truth. Code follows the spec, never the reverse.
---
## 1. What This Is
A workflow platform for M&A deal management. Investment Banks, Sellers, and Buyers collaborate on a structured request-and-answer system. The core primitive is a **Request** — not a document, not a folder. Documents are how requests get resolved.
**Not** a VDR that grew features. Designed clean, from first principles.
---
## 2. What This Is Not
- Not a document repository with a request list bolted on
- Not a project management tool with deal branding
- Not a clone of any existing product
- Not feature-complete on day one — the spec defines the architecture; MVP scope is separate
---
## 3. The Flow
```
IB creates Project
→ configures Workstreams (Finance, Legal, IT, HR, Operations...)
→ invites Participants (assigns role per workstream)
→ issues Request List to Seller
Seller receives Requests
→ assigns internally
→ uploads Answers (documents, data)
→ marks complete
IB vets Answers
→ approves → Answer published to Data Room
→ rejects → back to Seller with comment
Buyers enter
→ submit Requests (via Data Room interface)
→ AI matches against existing Answers (human confirms)
→ unmatched → routed to IB/Seller for resolution
→ Answer published → broadcast to all Buyers who asked equivalent question
```
---
## 4. Core Data Model — Entry-Based
Inspired directly by inou's entry architecture. **One table to rule them all.**
### 4.1 The Entry
```go
type Entry struct {
EntryID string // UUID, plain (never encrypted)
ProjectID string // UUID, plain
ParentID string // UUID, plain (empty = project root)
Type string // structural kind (see 4.2)
Depth int // 0=project, 1=workstream, 2=list, 3=request/answer
SearchKey string // packed: primary lookup (workstream slug, request ref#)
SearchKey2 string // packed: secondary lookup (requester org, answer hash)
Summary string // packed: structural/navigational ONLY — no content
Data []byte // packed: all content, metadata, routing, status
Stage string // plain: "pre_dataroom" | "dataroom" | "closed"
CreatedAt int64 // unix ms, plain
UpdatedAt int64 // unix ms, plain
CreatedBy string // user ID, plain
}
```
**Rule:** Summary is navigational only. Never put content in Summary. LLMs and MCP tools read Data.
### 4.2 Entry Types (Type field)
| Type | Depth | Description |
|----------------|-------|------------------------------------------|
| `project` | 0 | Top-level container |
| `workstream` | 1 | RBAC anchor (Finance, Legal, IT...) |
| `request_list` | 2 | Named collection of requests |
| `request` | 3 | A single request item |
| `answer` | 3 | A response to one or more requests |
| `comment` | * | Threaded comment on any entry |
### 4.3 Answer → Request Links (many-to-many)
One answer can satisfy N requests. When an answer is published, all linked requests are notified — broadcast to all requesting parties who have access.
```sql
CREATE TABLE answer_links (
answer_id TEXT NOT NULL REFERENCES entries(entry_id),
request_id TEXT NOT NULL REFERENCES entries(entry_id),
linked_by TEXT NOT NULL,
linked_at INTEGER NOT NULL,
confirmed INTEGER NOT NULL DEFAULT 0, -- 1 = human confirmed
ai_score REAL,
PRIMARY KEY (answer_id, request_id)
);
```
### 4.4 Entry Data (JSON inside packed blob)
For `request`:
```json
{
"title": "Provide audited financials FY2024",
"body": "...",
"priority": "high|normal|low",
"due_date": "2026-03-15",
"assigned_to": ["user_id"],
"status": "open|assigned|answered|vetted|published|closed",
"ref": "FIN-042"
}
```
For `answer`:
```json
{
"title": "...",
"body": "...",
"files": ["object_id"],
"status": "draft|submitted|approved|rejected|published",
"rejection_reason": "...",
"broadcast_to": "linked_requesters|all_workstream|all_dataroom"
}
```
---
## 5. RBAC
### 5.1 Roles
| Role | Scope | Permissions |
|-----------------|----------------|----------------------------------------------------------|
| `ib_admin` | Project | Full control, all workstreams |
| `ib_member` | Workstream(s) | Manage requests + vet answers in assigned workstreams |
| `seller_admin` | Project | See all requests directed at seller, manage seller team |
| `seller_member` | Workstream(s) | Answer requests in assigned workstreams |
| `buyer_admin` | Project | Manage buyer team, see data room |
| `buyer_member` | Workstream(s) | Submit requests, view published data room answers |
| `observer` | Workstream(s) | Read-only, no submission |
### 5.2 Access Table
```sql
CREATE TABLE access (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
workstream_id TEXT, -- null = all workstreams in this project
user_id TEXT NOT NULL,
role TEXT NOT NULL,
ops TEXT NOT NULL, -- "r", "rw", "rwdm"
granted_by TEXT NOT NULL,
granted_at INTEGER NOT NULL
);
```
**RBAC anchor:** The `workstream` entry (depth 1) is the access root. Every operation walks up to the workstream to resolve permissions.
### 5.3 The Single Throat
Three choke points. No exceptions. Not even "just this once."
```
All Reads → EntryRead(actorID, projectID, filter) → CheckAccess → query
All Writes → EntryWrite(actorID, entries...) → CheckAccess → save
All Deletes → EntryDelete(actorID, projectID, filter) → CheckAccess → delete
Object I/O → ObjectRead/ObjectWrite/ObjectDelete → CheckAccess → storage
```
No handler ever touches the DB directly. No raw SQL outside lib/dbcore.go.
### 5.4 Data Room Visibility
`Stage = "pre_dataroom"` entries are invisible to buyer roles — not filtered in the UI, invisible at the DB layer via CheckAccess. Buyers cannot see a question exists, let alone its answer, until the IB publishes it to the data room.
---
## 6. Security
### 6.1 Encryption & Compression
All string content fields use Pack / Unpack:
```
Pack: raw string → zstd compress → AES-256-GCM encrypt → []byte
Unpack: []byte → decrypt → decompress → string
```
- SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups
- Summary, Data: Non-deterministic (random nonce) — never queried directly
- IDs, integers, Stage: Plain text — structural, never sensitive
- Files: ObjectWrite encrypts before storage; ObjectRead decrypts on serve
Per-project encryption key derived from master key + project_id. One project's key compromise does not affect others.
### 6.2 File Protection Pipeline
Files are never served raw. Every file goes through the protection pipeline at serve time. The stored file is always the clean original.
| Type | Protection |
|---------------|------------------------------------------------------------------|
| PDF | Dynamic watermark (user + timestamp + org) rendered per-request |
| Word (.docx) | Watermark injected into document XML before serve |
| Excel (.xlsx) | Sheet protection + watermark header row injected before serve |
| Images | Watermark text burned into pixel data per-request |
| Video | Watermark overlay via ffmpeg, served as stream |
| Other | Encrypted download only, no preview |
Watermark content (configurable per project):
```
{user_name} · {org_name} · {datetime} · CONFIDENTIAL
```
Watermarks are generated at serve time. Parameters are project-level config, not hardcoded.
### 6.3 Storage Pricing (Competitive Advantage)
Files stored compressed + encrypted. No per-MB extortion. Competitors charge up to $20/MB for "secure storage." We store at actual cost. This is a direct and easy competitive win for Misha.
### 6.4 Audit Log
Every access grant change, file download, status transition — logged.
```sql
CREATE TABLE audit (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
actor_id TEXT NOT NULL,
action TEXT NOT NULL, -- packed
target_id TEXT,
details TEXT, -- packed
ip TEXT,
ts INTEGER NOT NULL
);
```
---
## 7. Object Store
```go
type ObjectStore interface {
Write(id string, data []byte) error
Read(id string) ([]byte, error)
Delete(id string) error
Exists(id string) bool
}
```
Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup.
Implementations: local filesystem (default), S3-compatible (plug-in). App code never knows the difference.
---
## 8. AI Matching Pipeline
When a buyer submits a request:
1. Embed request text (Fireworks nomic-embed-text-v1.5 — zero retention)
2. Cosine similarity vs all published answers in same workstream
3. Score ≥ 0.72 → suggest match, require human confirmation
4. Score < 0.72 route to IB/Seller for manual response
5. Human confirms → answer_links row `confirmed=1`, broadcast fires
Private data never leaves Fireworks (zero retention policy). Same infra as inou.
---
## 9. Themes
Theme = CSS custom properties bundle. Zero hardcoded colors in templates. Every color references a CSS var.
```go
type Theme struct {
ID string
Name string
ProjectID string // null = system theme
Properties string // packed — CSS vars as JSON
}
```
Built-in: Light, Dark, High-contrast. Projects can define a custom theme (brand colors, logo). Users can override with personal preference. Theme switching = swap one class on `<html>`. No JavaScript framework required.
---
## 10. MCP Support
MCP server exposes deal context to AI tools. Follows inou's MCP pattern:
- All tools operate within `(actor, project)` context — full RBAC enforced
- Read tools: list requests, query answers, check status, get workstream summary
- Write tools: AI-suggested routing (human confirmation required before any state change)
- Gating: AI cannot read pre-dataroom content without explicit unlock (mirrors inou's tier-1/tier-2 pattern)
Detailed MCP spec written separately after core schema is stable.
---
## 11. Go Implementation Rules
Non-negotiable. Violations require explicit discussion.
### 11.1 Package Structure
```
dealspace/
cmd/server/ main entry point, config loading
lib/
types.go All shared types — Entry, User, Project, Theme, etc.
dbcore.go EntryRead, EntryWrite, EntryDelete — the three choke points
rbac.go CheckAccess, permission resolution, role definitions
crypto.go Pack, Unpack, ObjectEncrypt, ObjectDecrypt
store.go ObjectStore interface + implementations
watermark.go Per-type watermark injection (PDF, DOCX, XLSX, image, video)
embed.go AI embedding client + cosine similarity
notify.go Broadcast logic — answer published → notify requesters
api/
middleware.go Auth, logging, rate limiting, CORS
handlers.go Thin handlers only — extract input, call lib, return response
routes.go Route registration
portal/
templates/ HTML templates (no hardcoded colors)
static/ CSS (theme vars), JS (minimal)
mcp/
server.go MCP tool registration and dispatch
```
### 11.2 Handler Rules
- Handlers: extract input → call lib → return response. Nothing else.
- No SQL in handlers. Ever.
- No business logic in handlers. Ever.
- If two handlers share logic → extract to lib.
- Error responses: one helper function, used everywhere. `{"error": "...", "code": "..."}`
### 11.3 DB Access Rules
- No `db.Query` / `db.Exec` outside `lib/dbcore.go`
- No raw SQL in any file outside `lib/dbcore.go`
- Entry access: `EntryRead`, `EntryWrite`, `EntryDelete` only
- Object access: `ObjectRead`, `ObjectWrite`, `ObjectDelete` only
- User/project/access operations: dedicated functions in dbcore, never inline SQL
### 11.4 Naming Conventions
- RBAC-enforced functions: exported, full name (`EntryRead`, `EntryWrite`)
- System-only bypass: unexported, explicit suffix (`entryReadSystem`)
- The distinction must be obvious from the name alone
---
## 12. UI Philosophy
- **Project = select box at the top.** One line. You pick your project and you're in it.
- No project browser consuming 20% of screen real estate.
- Workstream tabs within a project: Finance | Legal | IT | HR | Operations
- Information hierarchy: Workstream → Request List → Request → Answer
- Status visible without clicking in
- Competitor trap to avoid: adding features without removing complexity. Every new feature must justify its screen cost.
---
## 13. Out of Scope for MVP
- Email notifications
- Mobile app
- Third-party integrations (DocuSign, Salesforce)
- Public API
- Per-firm white-labeling
---
## 14. Retired Code
Previous attempt archived at `/home/johan/dev/dealroom-retired-20260228/`
**Carried forward:**
- AI matching concept (embeddings + cosine similarity at 0.72 threshold)
- Broadcast answer semantics
- Color palette
**Everything else starts fresh.**
---
## 15. Schema Change Checklist
When modifying the data model:
1. Update this SPEC.md first
2. Update `lib/types.go`
3. Update `lib/dbcore.go`
4. Update `lib/rbac.go` if access model changes
5. Update migration files
6. Update MCP tools if query patterns changed
7. No exceptions to this order
---
*This document is the ground truth. If code disagrees with the spec, the code is wrong.*
---
## 16. Workflow & Task Model (added 2026-02-28)
### 16.1 The Core Insight
Most users are **workers, not deal managers**. When the accountant logs in they see their task inbox — not a deal room, not workstream dashboards, not buyer activity. Just: what do I need to do today.
The big picture (deal progress, buyer activity, request completion %) is the IB admin's view. Role determines UI surface entirely. Same platform, completely different experience.
### 16.2 The Routing Chain
Tasks don't just get assigned — they have a return path. Every forward creates an obligation to return.
```
Buyer → IB analyst → CFO → accountant
↓ (done)
Buyer ← IB analyst ← CFO ←──┘
```
Each hop knows where it came from and where it goes back when done. The IB analyst sees "waiting on CFO" — the buyer sees nothing until the answer is published. Internal routing is invisible to external parties.
### 16.3 Entry Fields for Workflow (plain, indexed — never packed)
| Field | Purpose |
|----------------|---------------------------------------------------------------|
| `assignee_id` | Who has it RIGHT NOW — powers the personal task inbox |
| `return_to_id` | Who it goes back to when done |
| `origin_id` | The ultimate requestor (buyer) who triggered the chain |
`routing_chain` — packed in Data: full hop history with actors + timestamps, visible to IB admin only.
When the accountant marks done → automatically lands in CFO's inbox.
When CFO approves → automatically lands in IB analyst's inbox.
When IB analyst publishes → buyer is notified.
No manual re-routing at each hop. The chain is set when the task is forwarded.
### 16.4 entry_events Table
The thread behind every entry. This IS the workflow history.
```sql
CREATE TABLE entry_events (
id TEXT PRIMARY KEY,
entry_id TEXT NOT NULL REFERENCES entries(entry_id),
actor_id TEXT NOT NULL,
channel TEXT NOT NULL, -- "web" | "email" | "slack" | "teams"
action TEXT NOT NULL, -- packed: "message"|"upload"|"forward"|"approve"|"reject"|"publish"
data TEXT NOT NULL, -- packed: message body, file refs, status transition details
ts INTEGER NOT NULL
);
CREATE INDEX idx_events_entry ON entry_events(entry_id);
CREATE INDEX idx_events_actor ON entry_events(actor_id);
```
Full thread visible to deal managers. Workers (accountant, CFO) see only their task queue — `WHERE assignee_id = me`.
### 16.5 channel_threads Table
Maps external thread IDs to entries. Enables email/Slack/Teams participation without login.
```sql
CREATE TABLE channel_threads (
id TEXT PRIMARY KEY,
entry_id TEXT NOT NULL REFERENCES entries(entry_id),
channel TEXT NOT NULL, -- "email" | "slack" | "teams"
thread_id TEXT NOT NULL, -- Message-ID, thread_ts, conversationId
project_id TEXT NOT NULL,
UNIQUE(channel, thread_id)
);
```
Inbound message on a known thread_id → creates entry_event on the mapped entry → triggers next hop in routing chain.
### 16.6 Final Table List
```
entries — the tree + workflow state (assignee_id, return_to_id, origin_id plain+indexed)
entry_events — the thread / workflow history per entry
channel_threads — external channel routing (email/Slack/Teams → entry)
answer_links — answer ↔ request, ai_score, confirmed, rejected, vetting
users — accounts + auth
access — RBAC: (user, project, workstream, role, ops)
embeddings — (entry_id, vector BLOB) for AI matching
audit — security events: grants, downloads, logins, key transitions
```
Eight tables. No objects table — files referenced by 16-char hex ObjectID stored in entry Data.