dealspace/SPEC-REVIEW.md

620 lines
19 KiB
Markdown

# Dealspace SPEC.md — Architectural Review
**Reviewer:** James (Senior Architect)
**Date:** 2026-02-28
**Verdict:** Solid foundation, but several gaps will cause pain during implementation. The crypto section needs work before this touches real M&A data.
---
## Executive Summary
The spec shows clear thinking — entry-based architecture is proven (inou), the single-throat principle for DB access is correct, and the worker-centric UI philosophy is refreshing. However:
1. **The routing chain model is underspecified** — will break on real-world forwarding patterns
2. **RBAC has privilege escalation gaps** — who grants access? who revokes?
3. **Crypto section claims FIPS but doesn't deliver** — deterministic encryption is an antipattern
4. **Channel ingestion (email/Slack) has no auth model** — spoofing is trivial
5. **No concurrency model** — race conditions guaranteed
This platform handles M&A data. People go to prison for leaking this stuff. The spec needs hardening before code.
---
## 1. Routing Chain Model — Critical Gaps
### 1.1 The Three-Field Problem
The spec defines:
```
assignee_id — who has it NOW
return_to_id — who it goes back to
origin_id — ultimate requestor (buyer)
```
Example chain: `Buyer → IB analyst → CFO → accountant`
When accountant completes:
- `return_to_id = CFO`
- Task lands in CFO's inbox ✓
When CFO approves — **where does it go?** The spec says "automatically lands in IB analyst's inbox" but:
- How? `return_to_id` was CFO.
- The IB analyst hop isn't stored in plain fields.
- You'd need to deserialize `routing_chain` from packed Data, pop the stack, update `return_to_id`.
**Gap:** No specified algorithm for chain traversal. Implementation will have to invent one.
### 1.2 Chain Branching
Real scenario: CFO forwards the same request to **both** the accountant AND the tax attorney (parallel work).
- Which one's completion triggers the return?
- Do you wait for both?
- What if accountant submits but attorney rejects?
**Gap:** No fork/join semantics. The model assumes linear chains only.
### 1.3 Delegation & Unavailability
CFO goes on vacation. Tasks pile up in their inbox.
- No delegation model ("my tasks go to X while I'm out")
- No reassignment without breaking the chain
- No escalation triggers ("task stuck > N days")
**Gap:** Zero availability management. Every chain has a single point of failure at each hop.
### 1.4 Origin Access Changes Mid-Chain
Buyer submits request → IB forwards → Seller is working on it → **Buyer's access is revoked** (deal fell through, NDA violation, whatever).
When the answer is finally published:
- `origin_id` still points to that buyer
- Broadcast logic fires... then what?
- Does the answer publish to remaining buyers? Or fail silently?
- Is the request orphaned? Closed? Reassigned?
**Gap:** No state machine for access revocation during active workflows.
### 1.5 Recommended Fixes
```go
// Add to Entry (plain, indexed):
chain_depth int // current position in chain
chain_length int // total chain hops
// routing_chain structure (in Data):
type RoutingHop struct {
ActorID string
Action string // "forward" | "delegate" | "escalate"
Timestamp int64
Reason string // optional
}
```
Add explicit state machine:
```
request.status: open → assigned → forwarded → pending_review → approved → published → closed
delegated (parallel path)
```
Add delegation table:
```sql
CREATE TABLE delegations (
user_id TEXT NOT NULL,
delegate_id TEXT NOT NULL,
project_id TEXT, -- null = all projects
starts_at INTEGER NOT NULL,
ends_at INTEGER,
created_by TEXT NOT NULL,
PRIMARY KEY (user_id, delegate_id, project_id)
);
```
---
## 2. RBAC Model — Privilege Escalation Paths
### 2.1 Who Can Grant Access?
The `access` table has `granted_by` but no constraint on who can grant.
Can `seller_member` grant `seller_member` to a colleague? The spec doesn't say. In practice:
- IB should control IB team access
- Sellers should control seller team access
- Buyers should control buyer team access (within their firm)
**Gap:** No "can_grant" permission or role hierarchy.
### 2.2 Who Can Revoke Access?
Not specified. Related questions:
- Can you revoke access granted by someone else?
- What happens to in-flight tasks when access is revoked?
- Can you revoke your own access (leaving a deal)?
### 2.3 Cross-Project Data Leakage via Object Store
The spec says:
> Object ID = SHA-256 of encrypted content. Content-addressable — automatic dedup.
But also:
> Per-project encryption key derived from master key + project_id.
These are **mutually exclusive**. If each project has its own key:
- Same file uploaded to two projects = different ciphertext = different SHA-256 = no dedup
- Dedup is false advertising, OR
- There's a subtle bug waiting to happen
If there IS dedup across projects:
- File uploaded to Project A (by user with access)
- Same file uploaded to Project B (by different user)
- Are they now cryptographically linked?
**Gap:** Dedup claim contradicts per-project encryption. Pick one.
### 2.4 Observer Role Creep
`observer` has "Read-only, no submission." But:
- Can observers see `entry_events` (the full thread)?
- Can observers see `routing_chain` (internal forwarding)?
- Can observers see draft answers before publication?
For auditors: probably yes. For competing buyers: absolutely not.
**Gap:** Observer permission granularity undefined.
### 2.5 Recommended Fixes
Add to `access` table:
```sql
can_grant INTEGER NOT NULL DEFAULT 0, -- 1 = can grant same or lesser role
revoked_at INTEGER, -- soft revoke with timestamp
revoked_by TEXT
```
Add role hierarchy enum:
```go
var RoleHierarchy = map[string]int{
"ib_admin": 100,
"ib_member": 80,
"seller_admin": 70,
"seller_member": 50,
"buyer_admin": 40,
"buyer_member": 30,
"observer": 10,
}
// can_grant only works for roles at or below your level
```
---
## 3. Answer Broadcast Model — Race Conditions
### 3.1 Publish During Matching
Timeline:
1. Buyer A submits request R1 (t=0)
2. AI embedding starts (t=1)
3. Seller uploads answer A1 and IB publishes immediately (t=2)
4. AI embedding completes, searches for matches (t=3)
5. A1 is already published — does it get linked?
The spec says matching happens "when buyer submits request" but doesn't specify whether existing published answers are searched.
**Gap:** No retroactive matching. New requests only match future answers? Or historical too?
### 3.2 Parallel Matching Race
Two buyers submit equivalent requests simultaneously:
1. Buyer A submits R1 (t=0)
2. Buyer B submits R2 (t=1)
3. AI matches R1 → A1 (score 0.75), suggests link (t=2)
4. AI matches R2 → A1 (score 0.74), suggests link (t=3)
5. Human confirms R1 ↔ A1 (t=4)
6. Human confirms R2 ↔ A1 (t=5)
Both get `confirmed=1`. Both broadcast. But:
- If R1 confirmation triggered broadcast, did R2 get notified too?
- Or does R2 confirmation trigger a duplicate broadcast?
**Gap:** No idempotency on broadcast. Buyers may receive duplicate notifications.
### 3.3 Partial Match Rejection
Answer A1 matches requests R1, R2, R3 with scores 0.85, 0.73, 0.71.
Human reviews:
- Confirms R1 ↔ A1 ✓
- Rejects R2 ↔ A1 (not actually relevant)
- Confirms R3 ↔ A1 ✓
The `answer_links` table has `confirmed` but no `rejected` flag. How do we know R2 was reviewed and rejected vs. never reviewed?
**Gap:** No rejection tracking. Can't distinguish "not reviewed" from "reviewed and rejected."
### 3.4 Recommended Fixes
Add to `answer_links`:
```sql
reviewed_by TEXT,
reviewed_at INTEGER,
status TEXT NOT NULL DEFAULT 'pending', -- 'pending' | 'confirmed' | 'rejected'
reject_reason TEXT
```
Add broadcast idempotency:
```sql
CREATE TABLE broadcasts (
id TEXT PRIMARY KEY,
answer_id TEXT NOT NULL,
request_id TEXT NOT NULL,
recipient_id TEXT NOT NULL,
sent_at INTEGER NOT NULL,
UNIQUE(answer_id, request_id, recipient_id)
);
```
---
## 4. Channel Ingestion (Email/Slack/Teams) — Authentication Void
### 4.1 Unknown Sender Problem
The spec says:
> Enables email/Slack/Teams participation without login.
Email arrives from `cfo@megacorp.com`. Questions:
- Is that email address in our system?
- If not, what happens? Drop silently? Create a ghost user? Queue for review?
- If yes, how do we verify it's actually the CFO and not a spoofed email?
**Gap:** No inbound message authentication model.
### 4.2 Email Spoofing
M&A deals are high-value targets. Spoofing `cfo@seller.com` to submit a fake answer or approve a request = catastrophic.
The spec mentions nothing about:
- SPF/DKIM/DMARC verification
- Header analysis
- Reply-chain validation (is this actually a reply to our outbound?)
**Gap:** No email authentication. This is a critical security hole.
### 4.3 Thread Hijacking
`channel_threads` maps `thread_id` to `entry_id`. For email, `thread_id` is `Message-ID`.
Attacker who knows the Message-ID can craft an email with `In-Reply-To: <that-message-id>` and inject themselves into the thread.
**Gap:** No validation that the sender is an authorized participant in that entry's workflow.
### 4.4 Slack/Teams Token Scope
Not mentioned:
- What OAuth scopes are required?
- Who installs the app — IB admin? Each firm?
- What happens when a Slack workspace is disconnected mid-deal?
### 4.5 Recommended Fixes
Add `channel_participants` table:
```sql
CREATE TABLE channel_participants (
entry_id TEXT NOT NULL,
channel TEXT NOT NULL, -- "email" | "slack" | "teams"
external_id TEXT NOT NULL, -- email address, slack user ID, teams user ID
verified INTEGER NOT NULL DEFAULT 0,
added_at INTEGER NOT NULL,
PRIMARY KEY (entry_id, channel, external_id)
);
```
Inbound message flow:
1. Parse sender identity from message
2. Look up `channel_participants` for that entry
3. If not found OR not verified → quarantine, notify IB admin
4. If verified → create `entry_event`
For email specifically:
- Require DKIM pass or quarantine
- Store DKIM verification result in `entry_events.data`
---
## 5. FIPS 140-3 Crypto Section — Missing & Incorrect
### 5.1 No FIPS Module Reference
The spec mentions "AES-256-GCM" but FIPS 140-3 compliance requires using a **validated cryptographic module**. Options:
- Go's `crypto/aes` is NOT FIPS validated
- BoringCrypto (Google's fork) IS validated
- Or use a hardware HSM
**Gap:** No mention of which FIPS-validated implementation to use.
### 5.2 Deterministic Encryption is an Antipattern
The spec says:
> SearchKey/SearchKey2: Deterministic encryption (HMAC-derived nonce) — allows indexed lookups
This is a known antipattern for sensitive data:
- Same plaintext = same ciphertext
- Enables frequency analysis
- Enables known-plaintext attacks if attacker can inject search keys
For a request reference like "FIN-042":
- Attacker creates requests FIN-001 through FIN-999
- Observes which encrypted SearchKey matches the target
- Now knows the target's ref number
**Gap:** Deterministic encryption leaks information. Use encrypted search (like CipherSweet) or blind indexes for lookups.
### 5.3 No Key Derivation Function Specified
> Per-project encryption key derived from master key + project_id.
How? Options with very different security properties:
- `HMAC-SHA256(master, project_id)` — acceptable
- `SHA256(master + project_id)` — vulnerable to length extension
- `HKDF-SHA256` — correct answer for FIPS
**Gap:** KDF not specified. Implementer will guess.
### 5.4 No Key Rotation
Master key is compromised. Now what?
- How do you rotate?
- Do you re-encrypt all Data fields?
- What about ObjectStore content?
- How do you handle in-flight requests during rotation?
**Gap:** No key rotation procedure.
### 5.5 No Key Escrow / Recovery
Master key is lost.
- All data is unrecoverable
- Business impact: catastrophic
**Gap:** No key escrow or recovery mechanism.
### 5.6 Nonce Management
> Non-deterministic (random nonce)
AES-GCM with random nonce has a birthday bound problem at ~2^32 encryptions with the same key. For a busy deal with thousands of entries, each with multiple fields, you could hit this.
**Gap:** No nonce collision mitigation (like AES-GCM-SIV or nonce-misuse-resistant construction).
### 5.7 Recommended Fixes
```go
// Use HKDF for key derivation
func DeriveProjectKey(master []byte, projectID string) []byte {
return hkdf.Expand(sha256.New, master, []byte("dealspace-project-"+projectID), 32)
}
// Use AES-GCM-SIV for nonce-misuse resistance
// Or: 96-bit random nonce + 32-bit counter per key
// For searchable encryption, use blind index:
func BlindIndex(hmacKey []byte, plaintext string, bits int) string {
h := hmac.New(sha256.New, hmacKey)
h.Write([]byte(plaintext))
return base64.RawURLEncoding.EncodeToString(h.Sum(nil)[:bits/8])
}
// Store blind index in SearchKey, not deterministic ciphertext
```
Add key rotation fields:
```sql
ALTER TABLE entries ADD COLUMN key_version INTEGER NOT NULL DEFAULT 1;
```
---
## 6. Missing Pieces That Will Bite You
### 6.1 Session Management
Not a single mention of:
- JWT vs. session cookies
- Token expiration
- Refresh token flow
- Session revocation
- Concurrent session limits
For M&A: you probably want short-lived tokens (1 hour), single active session per user, immediate revocation on access change.
### 6.2 Authentication
- No password policy
- No MFA requirement (should be mandatory for M&A)
- No SSO/SAML mention (enterprise buyers will demand it)
- No account lockout after failed attempts
### 6.3 Concurrency Control
What happens when:
- Two people edit the same request simultaneously?
- IB approves while seller is still uploading?
- Answer is published while admin is editing its metadata?
**No optimistic locking. No conflict resolution. No ETags.**
This will cause data loss.
Add:
```sql
ALTER TABLE entries ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
```
And require `version` in all updates:
```sql
UPDATE entries SET ... WHERE entry_id = ? AND version = ?
```
### 6.4 File Versioning
> The stored file is always the clean original.
What if someone uploads a corrected version?
- Replace? Now you've lost the original.
- Add new? How do you link versions?
- What about answers already published referencing the old file?
**Gap:** No file versioning. First upload is permanent.
### 6.5 Soft Delete / Recovery
`EntryDelete` exists but:
- Is it hard delete?
- Can you recover?
- What about regulatory holds (litigation, investigation)?
M&A data often has retention requirements. Hard delete may be illegal.
Add:
```sql
ALTER TABLE entries ADD COLUMN deleted_at INTEGER;
ALTER TABLE entries ADD COLUMN deleted_by TEXT;
```
### 6.6 Rate Limiting
Mentioned in `middleware.go` but not specified:
- Per-user? Per-IP? Per-project?
- What are the limits?
- How are they enforced?
AI matching especially — embedding requests are expensive.
### 6.7 Backup & Recovery
Not mentioned:
- Backup frequency
- Point-in-time recovery
- Geographic redundancy
- RTO/RPO requirements
### 6.8 Deadline & Escalation
Requests have `due_date` but:
- What happens when it passes?
- Who gets notified?
- Is there auto-escalation?
- Can deadlines be extended? By whom?
### 6.9 Request Withdrawal
Buyer submits request, then changes mind. Can they:
- Withdraw it?
- What if it's already assigned?
- What if answer is in progress?
### 6.10 Answer Rejection Workflow
IB rejects answer with comment. Then:
- Does it go back to seller?
- To the specific assignee who submitted it?
- Can seller dispute the rejection?
- How many rejection cycles before escalation?
The spec says "back to Seller with comment" but no state machine.
---
## 7. Minor Issues & Ambiguities
### 7.1 Depth Inconsistency
Table says `comment` has depth `*`. What depth does it actually get?
- Depth of parent + 1?
- Fixed depth 4?
- No depth tracking for comments?
### 7.2 broadcast_to Semantics
Answer has:
```json
"broadcast_to": "linked_requesters|all_workstream|all_dataroom"
```
Who sets this? IB? Automatic? And:
- `linked_requesters` — only buyers who asked this exact question
- `all_workstream` — all buyers with workstream access
- `all_dataroom` — everyone in the data room
The difference between these is significant. Spec doesn't say when each applies.
### 7.3 Stage Transitions
`Stage = "pre_dataroom" | "dataroom" | "closed"`
Who transitions? When? What prevents premature transition?
- Can seller accidentally publish to dataroom?
- Can IB close a project while requests are pending?
### 7.4 The 0.72 Threshold
> Score ≥ 0.72 → suggest match
Why 0.72? Is this configurable per project? Tuned on what data?
This is the kind of magic number that will need adjustment per deal type (tech M&A vs. real estate vs. healthcare).
### 7.5 Watermark Performance
> Video: Watermark overlay via ffmpeg, served as stream
For a 2GB video file, this is expensive per-request. Options:
- Pre-render watermarked versions (storage cost)
- Render on-demand (latency cost)
- Cache rendered versions (complexity)
Not specified.
---
## 8. Summary: Must-Fix Before Implementation
| Priority | Issue | Risk |
|----------|-------|------|
| **P0** | Email spoofing / no auth | Security breach, false answers |
| **P0** | Deterministic encryption | Information leakage |
| **P0** | No FIPS module specified | Compliance failure |
| **P0** | No MFA | Account takeover |
| **P1** | Routing chain underspec | Broken workflows |
| **P1** | No concurrency control | Data loss |
| **P1** | Grant/revoke undefined | Privilege escalation |
| **P1** | No key rotation | Crypto emergency unrecoverable |
| **P1** | No session management | Security basics missing |
| **P2** | Broadcast race conditions | Duplicate notifications |
| **P2** | No soft delete | Regulatory compliance |
| **P2** | No file versioning | Data loss on corrections |
| **P2** | Object store dedup contradiction | Implementation confusion |
---
## 9. What's Actually Good
To be fair, the spec gets a lot right:
1. **Entry-based architecture** — proven pattern, good choice
2. **Single-throat DB access** — prevents SQL injection and access control bypasses
3. **Worker-centric UI** — this is genuinely differentiated
4. **Packed encryption with plain structural fields** — right balance for query vs. confidentiality
5. **Per-project keys** — blast radius limitation
6. **Watermarking pipeline** — proactive leak deterrence
7. **AI matching with human confirmation** — not trusting AI blindly
8. **Theme system** — will make white-labeling easy when needed
The foundation is solid. The gaps are fixable. Fix them before code.
---
*Review complete. Fix the P0s before writing a line of code.*