# Dealspace SPEC.md — Architectural Review **Reviewer:** James (Senior Architect) **Date:** 2026-02-28 **Verdict:** Solid foundation, but several gaps will cause pain during implementation. The crypto section needs work before this touches real M&A data. --- ## Executive Summary The spec shows clear thinking — entry-based architecture is proven (inou), the single-throat principle for DB access is correct, and the worker-centric UI philosophy is refreshing. However: 1. **The routing chain model is underspecified** — will break on real-world forwarding patterns 2. **RBAC has privilege escalation gaps** — who grants access? who revokes? 3. **Crypto section claims FIPS but doesn't deliver** — deterministic encryption is an antipattern 4. **Channel ingestion (email/Slack) has no auth model** — spoofing is trivial 5. **No concurrency model** — race conditions guaranteed This platform handles M&A data. People go to prison for leaking this stuff. The spec needs hardening before code. --- ## 1. Routing Chain Model — Critical Gaps ### 1.1 The Three-Field Problem The spec defines: ``` assignee_id — who has it NOW return_to_id — who it goes back to origin_id — ultimate requestor (buyer) ``` Example chain: `Buyer → IB analyst → CFO → accountant` When accountant completes: - `return_to_id = CFO` ✓ - Task lands in CFO's inbox ✓ When CFO approves — **where does it go?** The spec says "automatically lands in IB analyst's inbox" but: - How? `return_to_id` was CFO. - The IB analyst hop isn't stored in plain fields. - You'd need to deserialize `routing_chain` from packed Data, pop the stack, update `return_to_id`. **Gap:** No specified algorithm for chain traversal. Implementation will have to invent one. ### 1.2 Chain Branching Real scenario: CFO forwards the same request to **both** the accountant AND the tax attorney (parallel work). - Which one's completion triggers the return? - Do you wait for both? - What if accountant submits but attorney rejects? **Gap:** No fork/join semantics. The model assumes linear chains only. ### 1.3 Delegation & Unavailability CFO goes on vacation. Tasks pile up in their inbox. - No delegation model ("my tasks go to X while I'm out") - No reassignment without breaking the chain - No escalation triggers ("task stuck > N days") **Gap:** Zero availability management. Every chain has a single point of failure at each hop. ### 1.4 Origin Access Changes Mid-Chain Buyer submits request → IB forwards → Seller is working on it → **Buyer's access is revoked** (deal fell through, NDA violation, whatever). When the answer is finally published: - `origin_id` still points to that buyer - Broadcast logic fires... then what? - Does the answer publish to remaining buyers? Or fail silently? - Is the request orphaned? Closed? Reassigned? **Gap:** No state machine for access revocation during active workflows. ### 1.5 Recommended Fixes ```go // Add to Entry (plain, indexed): chain_depth int // current position in chain chain_length int // total chain hops // routing_chain structure (in Data): type RoutingHop struct { ActorID string Action string // "forward" | "delegate" | "escalate" Timestamp int64 Reason string // optional } ``` Add explicit state machine: ``` request.status: open → assigned → forwarded → pending_review → approved → published → closed ↓ delegated (parallel path) ``` Add delegation table: ```sql CREATE TABLE delegations ( user_id TEXT NOT NULL, delegate_id TEXT NOT NULL, project_id TEXT, -- null = all projects starts_at INTEGER NOT NULL, ends_at INTEGER, created_by TEXT NOT NULL, PRIMARY KEY (user_id, delegate_id, project_id) ); ``` --- ## 2. RBAC Model — Privilege Escalation Paths ### 2.1 Who Can Grant Access? The `access` table has `granted_by` but no constraint on who can grant. Can `seller_member` grant `seller_member` to a colleague? The spec doesn't say. In practice: - IB should control IB team access - Sellers should control seller team access - Buyers should control buyer team access (within their firm) **Gap:** No "can_grant" permission or role hierarchy. ### 2.2 Who Can Revoke Access? Not specified. Related questions: - Can you revoke access granted by someone else? - What happens to in-flight tasks when access is revoked? - Can you revoke your own access (leaving a deal)? ### 2.3 Cross-Project Data Leakage via Object Store The spec says: > Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup. But also: > Per-project encryption key derived from master key + project_id. These are **mutually exclusive**. If each project has its own key: - Same file uploaded to two projects = different ciphertext = different SHA-256 = no dedup - Dedup is false advertising, OR - There's a subtle bug waiting to happen If there IS dedup across projects: - File uploaded to Project A (by user with access) - Same file uploaded to Project B (by different user) - Are they now cryptographically linked? **Gap:** Dedup claim contradicts per-project encryption. Pick one. ### 2.4 Observer Role Creep `observer` has "Read-only, no submission." But: - Can observers see `entry_events` (the full thread)? - Can observers see `routing_chain` (internal forwarding)? - Can observers see draft answers before publication? For auditors: probably yes. For competing buyers: absolutely not. **Gap:** Observer permission granularity undefined. ### 2.5 Recommended Fixes Add to `access` table: ```sql can_grant INTEGER NOT NULL DEFAULT 0, -- 1 = can grant same or lesser role revoked_at INTEGER, -- soft revoke with timestamp revoked_by TEXT ``` Add role hierarchy enum: ```go var RoleHierarchy = map[string]int{ "ib_admin": 100, "ib_member": 80, "seller_admin": 70, "seller_member": 50, "buyer_admin": 40, "buyer_member": 30, "observer": 10, } // can_grant only works for roles at or below your level ``` --- ## 3. Answer Broadcast Model — Race Conditions ### 3.1 Publish During Matching Timeline: 1. Buyer A submits request R1 (t=0) 2. AI embedding starts (t=1) 3. Seller uploads answer A1 and IB publishes immediately (t=2) 4. AI embedding completes, searches for matches (t=3) 5. A1 is already published — does it get linked? The spec says matching happens "when buyer submits request" but doesn't specify whether existing published answers are searched. **Gap:** No retroactive matching. New requests only match future answers? Or historical too? ### 3.2 Parallel Matching Race Two buyers submit equivalent requests simultaneously: 1. Buyer A submits R1 (t=0) 2. Buyer B submits R2 (t=1) 3. AI matches R1 → A1 (score 0.75), suggests link (t=2) 4. AI matches R2 → A1 (score 0.74), suggests link (t=3) 5. Human confirms R1 ↔ A1 (t=4) 6. Human confirms R2 ↔ A1 (t=5) Both get `confirmed=1`. Both broadcast. But: - If R1 confirmation triggered broadcast, did R2 get notified too? - Or does R2 confirmation trigger a duplicate broadcast? **Gap:** No idempotency on broadcast. Buyers may receive duplicate notifications. ### 3.3 Partial Match Rejection Answer A1 matches requests R1, R2, R3 with scores 0.85, 0.73, 0.71. Human reviews: - Confirms R1 ↔ A1 ✓ - Rejects R2 ↔ A1 (not actually relevant) - Confirms R3 ↔ A1 ✓ The `answer_links` table has `confirmed` but no `rejected` flag. How do we know R2 was reviewed and rejected vs. never reviewed? **Gap:** No rejection tracking. Can't distinguish "not reviewed" from "reviewed and rejected." ### 3.4 Recommended Fixes Add to `answer_links`: ```sql reviewed_by TEXT, reviewed_at INTEGER, status TEXT NOT NULL DEFAULT 'pending', -- 'pending' | 'confirmed' | 'rejected' reject_reason TEXT ``` Add broadcast idempotency: ```sql CREATE TABLE broadcasts ( id TEXT PRIMARY KEY, answer_id TEXT NOT NULL, request_id TEXT NOT NULL, recipient_id TEXT NOT NULL, sent_at INTEGER NOT NULL, UNIQUE(answer_id, request_id, recipient_id) ); ``` --- ## 4. Channel Ingestion (Email/Slack/Teams) — Authentication Void ### 4.1 Unknown Sender Problem The spec says: > Enables email/Slack/Teams participation without login. Email arrives from `cfo@megacorp.com`. Questions: - Is that email address in our system? - If not, what happens? Drop silently? Create a ghost user? Queue for review? - If yes, how do we verify it's actually the CFO and not a spoofed email? **Gap:** No inbound message authentication model. ### 4.2 Email Spoofing M&A deals are high-value targets. Spoofing `cfo@seller.com` to submit a fake answer or approve a request = catastrophic. The spec mentions nothing about: - SPF/DKIM/DMARC verification - Header analysis - Reply-chain validation (is this actually a reply to our outbound?) **Gap:** No email authentication. This is a critical security hole. ### 4.3 Thread Hijacking `channel_threads` maps `thread_id` to `entry_id`. For email, `thread_id` is `Message-ID`. Attacker who knows the Message-ID can craft an email with `In-Reply-To: ` and inject themselves into the thread. **Gap:** No validation that the sender is an authorized participant in that entry's workflow. ### 4.4 Slack/Teams Token Scope Not mentioned: - What OAuth scopes are required? - Who installs the app — IB admin? Each firm? - What happens when a Slack workspace is disconnected mid-deal? ### 4.5 Recommended Fixes Add `channel_participants` table: ```sql CREATE TABLE channel_participants ( entry_id TEXT NOT NULL, channel TEXT NOT NULL, -- "email" | "slack" | "teams" external_id TEXT NOT NULL, -- email address, slack user ID, teams user ID verified INTEGER NOT NULL DEFAULT 0, added_at INTEGER NOT NULL, PRIMARY KEY (entry_id, channel, external_id) ); ``` Inbound message flow: 1. Parse sender identity from message 2. Look up `channel_participants` for that entry 3. If not found OR not verified → quarantine, notify IB admin 4. If verified → create `entry_event` For email specifically: - Require DKIM pass or quarantine - Store DKIM verification result in `entry_events.data` --- ## 5. FIPS 140-3 Crypto Section — Missing & Incorrect ### 5.1 No FIPS Module Reference The spec mentions "AES-256-GCM" but FIPS 140-3 compliance requires using a **validated cryptographic module**. Options: - Go's `crypto/aes` is NOT FIPS validated - BoringCrypto (Google's fork) IS validated - Or use a hardware HSM **Gap:** No mention of which FIPS-validated implementation to use. ### 5.2 Deterministic Encryption is an Antipattern The spec says: > SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups This is a known antipattern for sensitive data: - Same plaintext = same ciphertext - Enables frequency analysis - Enables known-plaintext attacks if attacker can inject search keys For a request reference like "FIN-042": - Attacker creates requests FIN-001 through FIN-999 - Observes which encrypted SearchKey matches the target - Now knows the target's ref number **Gap:** Deterministic encryption leaks information. Use encrypted search (like CipherSweet) or blind indexes for lookups. ### 5.3 No Key Derivation Function Specified > Per-project encryption key derived from master key + project_id. How? Options with very different security properties: - `HMAC-SHA256(master, project_id)` — acceptable - `SHA256(master + project_id)` — vulnerable to length extension - `HKDF-SHA256` — correct answer for FIPS **Gap:** KDF not specified. Implementer will guess. ### 5.4 No Key Rotation Master key is compromised. Now what? - How do you rotate? - Do you re-encrypt all Data fields? - What about ObjectStore content? - How do you handle in-flight requests during rotation? **Gap:** No key rotation procedure. ### 5.5 No Key Escrow / Recovery Master key is lost. - All data is unrecoverable - Business impact: catastrophic **Gap:** No key escrow or recovery mechanism. ### 5.6 Nonce Management > Non-deterministic (random nonce) AES-GCM with random nonce has a birthday bound problem at ~2^32 encryptions with the same key. For a busy deal with thousands of entries, each with multiple fields, you could hit this. **Gap:** No nonce collision mitigation (like AES-GCM-SIV or nonce-misuse-resistant construction). ### 5.7 Recommended Fixes ```go // Use HKDF for key derivation func DeriveProjectKey(master []byte, projectID string) []byte { return hkdf.Expand(sha256.New, master, []byte("dealspace-project-"+projectID), 32) } // Use AES-GCM-SIV for nonce-misuse resistance // Or: 96-bit random nonce + 32-bit counter per key // For searchable encryption, use blind index: func BlindIndex(hmacKey []byte, plaintext string, bits int) string { h := hmac.New(sha256.New, hmacKey) h.Write([]byte(plaintext)) return base64.RawURLEncoding.EncodeToString(h.Sum(nil)[:bits/8]) } // Store blind index in SearchKey, not deterministic ciphertext ``` Add key rotation fields: ```sql ALTER TABLE entries ADD COLUMN key_version INTEGER NOT NULL DEFAULT 1; ``` --- ## 6. Missing Pieces That Will Bite You ### 6.1 Session Management Not a single mention of: - JWT vs. session cookies - Token expiration - Refresh token flow - Session revocation - Concurrent session limits For M&A: you probably want short-lived tokens (1 hour), single active session per user, immediate revocation on access change. ### 6.2 Authentication - No password policy - No MFA requirement (should be mandatory for M&A) - No SSO/SAML mention (enterprise buyers will demand it) - No account lockout after failed attempts ### 6.3 Concurrency Control What happens when: - Two people edit the same request simultaneously? - IB approves while seller is still uploading? - Answer is published while admin is editing its metadata? **No optimistic locking. No conflict resolution. No ETags.** This will cause data loss. Add: ```sql ALTER TABLE entries ADD COLUMN version INTEGER NOT NULL DEFAULT 1; ``` And require `version` in all updates: ```sql UPDATE entries SET ... WHERE entry_id = ? AND version = ? ``` ### 6.4 File Versioning > The stored file is always the clean original. What if someone uploads a corrected version? - Replace? Now you've lost the original. - Add new? How do you link versions? - What about answers already published referencing the old file? **Gap:** No file versioning. First upload is permanent. ### 6.5 Soft Delete / Recovery `EntryDelete` exists but: - Is it hard delete? - Can you recover? - What about regulatory holds (litigation, investigation)? M&A data often has retention requirements. Hard delete may be illegal. Add: ```sql ALTER TABLE entries ADD COLUMN deleted_at INTEGER; ALTER TABLE entries ADD COLUMN deleted_by TEXT; ``` ### 6.6 Rate Limiting Mentioned in `middleware.go` but not specified: - Per-user? Per-IP? Per-project? - What are the limits? - How are they enforced? AI matching especially — embedding requests are expensive. ### 6.7 Backup & Recovery Not mentioned: - Backup frequency - Point-in-time recovery - Geographic redundancy - RTO/RPO requirements ### 6.8 Deadline & Escalation Requests have `due_date` but: - What happens when it passes? - Who gets notified? - Is there auto-escalation? - Can deadlines be extended? By whom? ### 6.9 Request Withdrawal Buyer submits request, then changes mind. Can they: - Withdraw it? - What if it's already assigned? - What if answer is in progress? ### 6.10 Answer Rejection Workflow IB rejects answer with comment. Then: - Does it go back to seller? - To the specific assignee who submitted it? - Can seller dispute the rejection? - How many rejection cycles before escalation? The spec says "back to Seller with comment" but no state machine. --- ## 7. Minor Issues & Ambiguities ### 7.1 Depth Inconsistency Table says `comment` has depth `*`. What depth does it actually get? - Depth of parent + 1? - Fixed depth 4? - No depth tracking for comments? ### 7.2 broadcast_to Semantics Answer has: ```json "broadcast_to": "linked_requesters|all_workstream|all_dataroom" ``` Who sets this? IB? Automatic? And: - `linked_requesters` — only buyers who asked this exact question - `all_workstream` — all buyers with workstream access - `all_dataroom` — everyone in the data room The difference between these is significant. Spec doesn't say when each applies. ### 7.3 Stage Transitions `Stage = "pre_dataroom" | "dataroom" | "closed"` Who transitions? When? What prevents premature transition? - Can seller accidentally publish to dataroom? - Can IB close a project while requests are pending? ### 7.4 The 0.72 Threshold > Score ≥ 0.72 → suggest match Why 0.72? Is this configurable per project? Tuned on what data? This is the kind of magic number that will need adjustment per deal type (tech M&A vs. real estate vs. healthcare). ### 7.5 Watermark Performance > Video: Watermark overlay via ffmpeg, served as stream For a 2GB video file, this is expensive per-request. Options: - Pre-render watermarked versions (storage cost) - Render on-demand (latency cost) - Cache rendered versions (complexity) Not specified. --- ## 8. Summary: Must-Fix Before Implementation | Priority | Issue | Risk | |----------|-------|------| | **P0** | Email spoofing / no auth | Security breach, false answers | | **P0** | Deterministic encryption | Information leakage | | **P0** | No FIPS module specified | Compliance failure | | **P0** | No MFA | Account takeover | | **P1** | Routing chain underspec | Broken workflows | | **P1** | No concurrency control | Data loss | | **P1** | Grant/revoke undefined | Privilege escalation | | **P1** | No key rotation | Crypto emergency unrecoverable | | **P1** | No session management | Security basics missing | | **P2** | Broadcast race conditions | Duplicate notifications | | **P2** | No soft delete | Regulatory compliance | | **P2** | No file versioning | Data loss on corrections | | **P2** | Object store dedup contradiction | Implementation confusion | --- ## 9. What's Actually Good To be fair, the spec gets a lot right: 1. **Entry-based architecture** — proven pattern, good choice 2. **Single-throat DB access** — prevents SQL injection and access control bypasses 3. **Worker-centric UI** — this is genuinely differentiated 4. **Packed encryption with plain structural fields** — right balance for query vs. confidentiality 5. **Per-project keys** — blast radius limitation 6. **Watermarking pipeline** — proactive leak deterrence 7. **AI matching with human confirmation** — not trusting AI blindly 8. **Theme system** — will make white-labeling easy when needed The foundation is solid. The gaps are fixable. Fix them before code. --- *Review complete. Fix the P0s before writing a line of code.*