19 KiB
Dealspace SPEC.md — Architectural Review
Reviewer: James (Senior Architect)
Date: 2026-02-28
Verdict: Solid foundation, but several gaps will cause pain during implementation. The crypto section needs work before this touches real M&A data.
Executive Summary
The spec shows clear thinking — entry-based architecture is proven (inou), the single-throat principle for DB access is correct, and the worker-centric UI philosophy is refreshing. However:
- The routing chain model is underspecified — will break on real-world forwarding patterns
- RBAC has privilege escalation gaps — who grants access? who revokes?
- Crypto section claims FIPS but doesn't deliver — deterministic encryption is an antipattern
- Channel ingestion (email/Slack) has no auth model — spoofing is trivial
- No concurrency model — race conditions guaranteed
This platform handles M&A data. People go to prison for leaking this stuff. The spec needs hardening before code.
1. Routing Chain Model — Critical Gaps
1.1 The Three-Field Problem
The spec defines:
assignee_id — who has it NOW
return_to_id — who it goes back to
origin_id — ultimate requestor (buyer)
Example chain: Buyer → IB analyst → CFO → accountant
When accountant completes:
return_to_id = CFO✓- Task lands in CFO's inbox ✓
When CFO approves — where does it go? The spec says "automatically lands in IB analyst's inbox" but:
- How?
return_to_idwas CFO. - The IB analyst hop isn't stored in plain fields.
- You'd need to deserialize
routing_chainfrom packed Data, pop the stack, updatereturn_to_id.
Gap: No specified algorithm for chain traversal. Implementation will have to invent one.
1.2 Chain Branching
Real scenario: CFO forwards the same request to both the accountant AND the tax attorney (parallel work).
- Which one's completion triggers the return?
- Do you wait for both?
- What if accountant submits but attorney rejects?
Gap: No fork/join semantics. The model assumes linear chains only.
1.3 Delegation & Unavailability
CFO goes on vacation. Tasks pile up in their inbox.
- No delegation model ("my tasks go to X while I'm out")
- No reassignment without breaking the chain
- No escalation triggers ("task stuck > N days")
Gap: Zero availability management. Every chain has a single point of failure at each hop.
1.4 Origin Access Changes Mid-Chain
Buyer submits request → IB forwards → Seller is working on it → Buyer's access is revoked (deal fell through, NDA violation, whatever).
When the answer is finally published:
origin_idstill points to that buyer- Broadcast logic fires... then what?
- Does the answer publish to remaining buyers? Or fail silently?
- Is the request orphaned? Closed? Reassigned?
Gap: No state machine for access revocation during active workflows.
1.5 Recommended Fixes
// Add to Entry (plain, indexed):
chain_depth int // current position in chain
chain_length int // total chain hops
// routing_chain structure (in Data):
type RoutingHop struct {
ActorID string
Action string // "forward" | "delegate" | "escalate"
Timestamp int64
Reason string // optional
}
Add explicit state machine:
request.status: open → assigned → forwarded → pending_review → approved → published → closed
↓
delegated (parallel path)
Add delegation table:
CREATE TABLE delegations (
user_id TEXT NOT NULL,
delegate_id TEXT NOT NULL,
project_id TEXT, -- null = all projects
starts_at INTEGER NOT NULL,
ends_at INTEGER,
created_by TEXT NOT NULL,
PRIMARY KEY (user_id, delegate_id, project_id)
);
2. RBAC Model — Privilege Escalation Paths
2.1 Who Can Grant Access?
The access table has granted_by but no constraint on who can grant.
Can seller_member grant seller_member to a colleague? The spec doesn't say. In practice:
- IB should control IB team access
- Sellers should control seller team access
- Buyers should control buyer team access (within their firm)
Gap: No "can_grant" permission or role hierarchy.
2.2 Who Can Revoke Access?
Not specified. Related questions:
- Can you revoke access granted by someone else?
- What happens to in-flight tasks when access is revoked?
- Can you revoke your own access (leaving a deal)?
2.3 Cross-Project Data Leakage via Object Store
The spec says:
Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup.
But also:
Per-project encryption key derived from master key + project_id.
These are mutually exclusive. If each project has its own key:
- Same file uploaded to two projects = different ciphertext = different SHA-256 = no dedup
- Dedup is false advertising, OR
- There's a subtle bug waiting to happen
If there IS dedup across projects:
- File uploaded to Project A (by user with access)
- Same file uploaded to Project B (by different user)
- Are they now cryptographically linked?
Gap: Dedup claim contradicts per-project encryption. Pick one.
2.4 Observer Role Creep
observer has "Read-only, no submission." But:
- Can observers see
entry_events(the full thread)? - Can observers see
routing_chain(internal forwarding)? - Can observers see draft answers before publication?
For auditors: probably yes. For competing buyers: absolutely not.
Gap: Observer permission granularity undefined.
2.5 Recommended Fixes
Add to access table:
can_grant INTEGER NOT NULL DEFAULT 0, -- 1 = can grant same or lesser role
revoked_at INTEGER, -- soft revoke with timestamp
revoked_by TEXT
Add role hierarchy enum:
var RoleHierarchy = map[string]int{
"ib_admin": 100,
"ib_member": 80,
"seller_admin": 70,
"seller_member": 50,
"buyer_admin": 40,
"buyer_member": 30,
"observer": 10,
}
// can_grant only works for roles at or below your level
3. Answer Broadcast Model — Race Conditions
3.1 Publish During Matching
Timeline:
- Buyer A submits request R1 (t=0)
- AI embedding starts (t=1)
- Seller uploads answer A1 and IB publishes immediately (t=2)
- AI embedding completes, searches for matches (t=3)
- A1 is already published — does it get linked?
The spec says matching happens "when buyer submits request" but doesn't specify whether existing published answers are searched.
Gap: No retroactive matching. New requests only match future answers? Or historical too?
3.2 Parallel Matching Race
Two buyers submit equivalent requests simultaneously:
- Buyer A submits R1 (t=0)
- Buyer B submits R2 (t=1)
- AI matches R1 → A1 (score 0.75), suggests link (t=2)
- AI matches R2 → A1 (score 0.74), suggests link (t=3)
- Human confirms R1 ↔ A1 (t=4)
- Human confirms R2 ↔ A1 (t=5)
Both get confirmed=1. Both broadcast. But:
- If R1 confirmation triggered broadcast, did R2 get notified too?
- Or does R2 confirmation trigger a duplicate broadcast?
Gap: No idempotency on broadcast. Buyers may receive duplicate notifications.
3.3 Partial Match Rejection
Answer A1 matches requests R1, R2, R3 with scores 0.85, 0.73, 0.71.
Human reviews:
- Confirms R1 ↔ A1 ✓
- Rejects R2 ↔ A1 (not actually relevant)
- Confirms R3 ↔ A1 ✓
The answer_links table has confirmed but no rejected flag. How do we know R2 was reviewed and rejected vs. never reviewed?
Gap: No rejection tracking. Can't distinguish "not reviewed" from "reviewed and rejected."
3.4 Recommended Fixes
Add to answer_links:
reviewed_by TEXT,
reviewed_at INTEGER,
status TEXT NOT NULL DEFAULT 'pending', -- 'pending' | 'confirmed' | 'rejected'
reject_reason TEXT
Add broadcast idempotency:
CREATE TABLE broadcasts (
id TEXT PRIMARY KEY,
answer_id TEXT NOT NULL,
request_id TEXT NOT NULL,
recipient_id TEXT NOT NULL,
sent_at INTEGER NOT NULL,
UNIQUE(answer_id, request_id, recipient_id)
);
4. Channel Ingestion (Email/Slack/Teams) — Authentication Void
4.1 Unknown Sender Problem
The spec says:
Enables email/Slack/Teams participation without login.
Email arrives from cfo@megacorp.com. Questions:
- Is that email address in our system?
- If not, what happens? Drop silently? Create a ghost user? Queue for review?
- If yes, how do we verify it's actually the CFO and not a spoofed email?
Gap: No inbound message authentication model.
4.2 Email Spoofing
M&A deals are high-value targets. Spoofing cfo@seller.com to submit a fake answer or approve a request = catastrophic.
The spec mentions nothing about:
- SPF/DKIM/DMARC verification
- Header analysis
- Reply-chain validation (is this actually a reply to our outbound?)
Gap: No email authentication. This is a critical security hole.
4.3 Thread Hijacking
channel_threads maps thread_id to entry_id. For email, thread_id is Message-ID.
Attacker who knows the Message-ID can craft an email with In-Reply-To: <that-message-id> and inject themselves into the thread.
Gap: No validation that the sender is an authorized participant in that entry's workflow.
4.4 Slack/Teams Token Scope
Not mentioned:
- What OAuth scopes are required?
- Who installs the app — IB admin? Each firm?
- What happens when a Slack workspace is disconnected mid-deal?
4.5 Recommended Fixes
Add channel_participants table:
CREATE TABLE channel_participants (
entry_id TEXT NOT NULL,
channel TEXT NOT NULL, -- "email" | "slack" | "teams"
external_id TEXT NOT NULL, -- email address, slack user ID, teams user ID
verified INTEGER NOT NULL DEFAULT 0,
added_at INTEGER NOT NULL,
PRIMARY KEY (entry_id, channel, external_id)
);
Inbound message flow:
- Parse sender identity from message
- Look up
channel_participantsfor that entry - If not found OR not verified → quarantine, notify IB admin
- If verified → create
entry_event
For email specifically:
- Require DKIM pass or quarantine
- Store DKIM verification result in
entry_events.data
5. FIPS 140-3 Crypto Section — Missing & Incorrect
5.1 No FIPS Module Reference
The spec mentions "AES-256-GCM" but FIPS 140-3 compliance requires using a validated cryptographic module. Options:
- Go's
crypto/aesis NOT FIPS validated - BoringCrypto (Google's fork) IS validated
- Or use a hardware HSM
Gap: No mention of which FIPS-validated implementation to use.
5.2 Deterministic Encryption is an Antipattern
The spec says:
SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups
This is a known antipattern for sensitive data:
- Same plaintext = same ciphertext
- Enables frequency analysis
- Enables known-plaintext attacks if attacker can inject search keys
For a request reference like "FIN-042":
- Attacker creates requests FIN-001 through FIN-999
- Observes which encrypted SearchKey matches the target
- Now knows the target's ref number
Gap: Deterministic encryption leaks information. Use encrypted search (like CipherSweet) or blind indexes for lookups.
5.3 No Key Derivation Function Specified
Per-project encryption key derived from master key + project_id.
How? Options with very different security properties:
HMAC-SHA256(master, project_id)— acceptableSHA256(master + project_id)— vulnerable to length extensionHKDF-SHA256— correct answer for FIPS
Gap: KDF not specified. Implementer will guess.
5.4 No Key Rotation
Master key is compromised. Now what?
- How do you rotate?
- Do you re-encrypt all Data fields?
- What about ObjectStore content?
- How do you handle in-flight requests during rotation?
Gap: No key rotation procedure.
5.5 No Key Escrow / Recovery
Master key is lost.
- All data is unrecoverable
- Business impact: catastrophic
Gap: No key escrow or recovery mechanism.
5.6 Nonce Management
Non-deterministic (random nonce)
AES-GCM with random nonce has a birthday bound problem at ~2^32 encryptions with the same key. For a busy deal with thousands of entries, each with multiple fields, you could hit this.
Gap: No nonce collision mitigation (like AES-GCM-SIV or nonce-misuse-resistant construction).
5.7 Recommended Fixes
// Use HKDF for key derivation
func DeriveProjectKey(master []byte, projectID string) []byte {
return hkdf.Expand(sha256.New, master, []byte("dealspace-project-"+projectID), 32)
}
// Use AES-GCM-SIV for nonce-misuse resistance
// Or: 96-bit random nonce + 32-bit counter per key
// For searchable encryption, use blind index:
func BlindIndex(hmacKey []byte, plaintext string, bits int) string {
h := hmac.New(sha256.New, hmacKey)
h.Write([]byte(plaintext))
return base64.RawURLEncoding.EncodeToString(h.Sum(nil)[:bits/8])
}
// Store blind index in SearchKey, not deterministic ciphertext
Add key rotation fields:
ALTER TABLE entries ADD COLUMN key_version INTEGER NOT NULL DEFAULT 1;
6. Missing Pieces That Will Bite You
6.1 Session Management
Not a single mention of:
- JWT vs. session cookies
- Token expiration
- Refresh token flow
- Session revocation
- Concurrent session limits
For M&A: you probably want short-lived tokens (1 hour), single active session per user, immediate revocation on access change.
6.2 Authentication
- No password policy
- No MFA requirement (should be mandatory for M&A)
- No SSO/SAML mention (enterprise buyers will demand it)
- No account lockout after failed attempts
6.3 Concurrency Control
What happens when:
- Two people edit the same request simultaneously?
- IB approves while seller is still uploading?
- Answer is published while admin is editing its metadata?
No optimistic locking. No conflict resolution. No ETags.
This will cause data loss.
Add:
ALTER TABLE entries ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
And require version in all updates:
UPDATE entries SET ... WHERE entry_id = ? AND version = ?
6.4 File Versioning
The stored file is always the clean original.
What if someone uploads a corrected version?
- Replace? Now you've lost the original.
- Add new? How do you link versions?
- What about answers already published referencing the old file?
Gap: No file versioning. First upload is permanent.
6.5 Soft Delete / Recovery
EntryDelete exists but:
- Is it hard delete?
- Can you recover?
- What about regulatory holds (litigation, investigation)?
M&A data often has retention requirements. Hard delete may be illegal.
Add:
ALTER TABLE entries ADD COLUMN deleted_at INTEGER;
ALTER TABLE entries ADD COLUMN deleted_by TEXT;
6.6 Rate Limiting
Mentioned in middleware.go but not specified:
- Per-user? Per-IP? Per-project?
- What are the limits?
- How are they enforced?
AI matching especially — embedding requests are expensive.
6.7 Backup & Recovery
Not mentioned:
- Backup frequency
- Point-in-time recovery
- Geographic redundancy
- RTO/RPO requirements
6.8 Deadline & Escalation
Requests have due_date but:
- What happens when it passes?
- Who gets notified?
- Is there auto-escalation?
- Can deadlines be extended? By whom?
6.9 Request Withdrawal
Buyer submits request, then changes mind. Can they:
- Withdraw it?
- What if it's already assigned?
- What if answer is in progress?
6.10 Answer Rejection Workflow
IB rejects answer with comment. Then:
- Does it go back to seller?
- To the specific assignee who submitted it?
- Can seller dispute the rejection?
- How many rejection cycles before escalation?
The spec says "back to Seller with comment" but no state machine.
7. Minor Issues & Ambiguities
7.1 Depth Inconsistency
Table says comment has depth *. What depth does it actually get?
- Depth of parent + 1?
- Fixed depth 4?
- No depth tracking for comments?
7.2 broadcast_to Semantics
Answer has:
"broadcast_to": "linked_requesters|all_workstream|all_dataroom"
Who sets this? IB? Automatic? And:
linked_requesters— only buyers who asked this exact questionall_workstream— all buyers with workstream accessall_dataroom— everyone in the data room
The difference between these is significant. Spec doesn't say when each applies.
7.3 Stage Transitions
Stage = "pre_dataroom" | "dataroom" | "closed"
Who transitions? When? What prevents premature transition?
- Can seller accidentally publish to dataroom?
- Can IB close a project while requests are pending?
7.4 The 0.72 Threshold
Score ≥ 0.72 → suggest match
Why 0.72? Is this configurable per project? Tuned on what data?
This is the kind of magic number that will need adjustment per deal type (tech M&A vs. real estate vs. healthcare).
7.5 Watermark Performance
Video: Watermark overlay via ffmpeg, served as stream
For a 2GB video file, this is expensive per-request. Options:
- Pre-render watermarked versions (storage cost)
- Render on-demand (latency cost)
- Cache rendered versions (complexity)
Not specified.
8. Summary: Must-Fix Before Implementation
| Priority | Issue | Risk |
|---|---|---|
| P0 | Email spoofing / no auth | Security breach, false answers |
| P0 | Deterministic encryption | Information leakage |
| P0 | No FIPS module specified | Compliance failure |
| P0 | No MFA | Account takeover |
| P1 | Routing chain underspec | Broken workflows |
| P1 | No concurrency control | Data loss |
| P1 | Grant/revoke undefined | Privilege escalation |
| P1 | No key rotation | Crypto emergency unrecoverable |
| P1 | No session management | Security basics missing |
| P2 | Broadcast race conditions | Duplicate notifications |
| P2 | No soft delete | Regulatory compliance |
| P2 | No file versioning | Data loss on corrections |
| P2 | Object store dedup contradiction | Implementation confusion |
9. What's Actually Good
To be fair, the spec gets a lot right:
- Entry-based architecture — proven pattern, good choice
- Single-throat DB access — prevents SQL injection and access control bypasses
- Worker-centric UI — this is genuinely differentiated
- Packed encryption with plain structural fields — right balance for query vs. confidentiality
- Per-project keys — blast radius limitation
- Watermarking pipeline — proactive leak deterrence
- AI matching with human confirmation — not trusting AI blindly
- Theme system — will make white-labeling easy when needed
The foundation is solid. The gaps are fixable. Fix them before code.
Review complete. Fix the P0s before writing a line of code.