334 lines
12 KiB
Markdown
334 lines
12 KiB
Markdown
# Feature Spec: Responses, AI Matching, Assignment Rules
|
|
|
|
## Context
|
|
Dealspace needs to separate *what buyers ask* (requests) from *what sellers provide* (responses), with AI automatically discovering which responses satisfy which requests via embeddings. Confirmed by a human before being counted as answered.
|
|
|
|
## Locked Decisions
|
|
- Assignment rules: per deal
|
|
- Statements (typed text answers): IN SCOPE
|
|
- Extraction: async background worker
|
|
- Confirmation: internal users only (RBAC refinement later)
|
|
|
|
---
|
|
|
|
## 1. Schema Changes
|
|
|
|
### New tables (add to migrate.go as CREATE TABLE IF NOT EXISTS in the migrations slice)
|
|
|
|
```sql
|
|
-- Responses: seller-provided answers (document OR typed statement)
|
|
CREATE TABLE IF NOT EXISTS responses (
|
|
id TEXT PRIMARY KEY,
|
|
deal_id TEXT NOT NULL,
|
|
type TEXT NOT NULL CHECK (type IN ('document','statement')),
|
|
title TEXT NOT NULL,
|
|
body TEXT DEFAULT '', -- markdown: extracted doc content OR typed text
|
|
file_id TEXT DEFAULT '', -- populated for type='document'
|
|
extraction_status TEXT DEFAULT 'pending'
|
|
CHECK (extraction_status IN ('pending','processing','done','failed')),
|
|
created_by TEXT DEFAULT '',
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
FOREIGN KEY (deal_id) REFERENCES deals(id)
|
|
);
|
|
|
|
-- Chunks: segments of a response body for fine-grained matching
|
|
CREATE TABLE IF NOT EXISTS response_chunks (
|
|
id TEXT PRIMARY KEY,
|
|
response_id TEXT NOT NULL,
|
|
chunk_index INTEGER NOT NULL,
|
|
text TEXT NOT NULL,
|
|
vector BLOB NOT NULL, -- []float32 serialised as little-endian bytes
|
|
FOREIGN KEY (response_id) REFERENCES responses(id)
|
|
);
|
|
|
|
-- N:M: AI-discovered links between requests and response chunks
|
|
CREATE TABLE IF NOT EXISTS request_links (
|
|
request_id TEXT NOT NULL,
|
|
response_id TEXT NOT NULL,
|
|
chunk_id TEXT NOT NULL,
|
|
confidence REAL NOT NULL, -- cosine similarity 0-1
|
|
auto_linked BOOLEAN DEFAULT 1,
|
|
confirmed BOOLEAN DEFAULT 0,
|
|
confirmed_by TEXT DEFAULT '',
|
|
confirmed_at DATETIME,
|
|
PRIMARY KEY (request_id, response_id, chunk_id)
|
|
);
|
|
|
|
-- Assignment rules: keyword → assignee, per deal
|
|
CREATE TABLE IF NOT EXISTS assignment_rules (
|
|
id TEXT PRIMARY KEY,
|
|
deal_id TEXT NOT NULL,
|
|
keyword TEXT NOT NULL, -- e.g. "Legal", "Tax", "HR"
|
|
assignee_id TEXT NOT NULL, -- profile ID
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
FOREIGN KEY (deal_id) REFERENCES deals(id)
|
|
);
|
|
```
|
|
|
|
### Additive migrations (append to additiveMigrationStmts in migrate.go)
|
|
```go
|
|
`ALTER TABLE diligence_requests ADD COLUMN assignee_id TEXT DEFAULT ''`,
|
|
`ALTER TABLE diligence_requests ADD COLUMN status TEXT DEFAULT 'open'`,
|
|
`ALTER TABLE files ADD COLUMN response_id TEXT DEFAULT ''`,
|
|
```
|
|
|
|
Note: status CHECK constraint can't be added via ALTER TABLE in SQLite — enforce in handler.
|
|
|
|
---
|
|
|
|
## 2. Fireworks Client
|
|
|
|
Create `internal/fireworks/client.go`:
|
|
|
|
```
|
|
Package: fireworks
|
|
|
|
Fireworks API key: fw_RVcDe4c6mN4utKLsgA7hTm
|
|
Base URL: https://api.fireworks.ai/inference/v1
|
|
|
|
Functions needed:
|
|
|
|
1. ExtractToMarkdown(ctx, imageBase64 []string, filename string) (string, error)
|
|
- Model: accounts/fireworks/models/llama-v3p2-90b-vision-instruct
|
|
- System prompt: "You are a document extraction expert. Extract ALL content from this document into clean markdown. Preserve headings, tables, lists, and structure. Do not summarise — extract everything."
|
|
- Send up to 10 images per call (multi-page docs: batch into 10-page chunks, concatenate results)
|
|
- For XLSX files (no images): use a different path — just send the structured data as text
|
|
- Return full markdown string
|
|
|
|
2. EmbedText(ctx, texts []string) ([][]float32, error)
|
|
- Model: nomic-ai/nomic-embed-text-v1.5
|
|
- POST /embeddings (OpenAI-compatible)
|
|
- Batch up to 50 texts per call
|
|
- Return [][]float32
|
|
|
|
3. CosineSimilarity(a, b []float32) float32
|
|
- Pure Go dot product (normalised vectors)
|
|
```
|
|
|
|
---
|
|
|
|
## 3. PDF-to-Images Conversion
|
|
|
|
Create `internal/extract/pdf.go`:
|
|
|
|
```
|
|
Use exec("pdftoppm") subprocess:
|
|
pdftoppm -jpeg -r 150 input.pdf /tmp/prefix
|
|
→ produces /tmp/prefix-1.jpg, /tmp/prefix-2.jpg, ...
|
|
|
|
Read each JPEG → base64 encode → pass to fireworks.ExtractToMarkdown
|
|
|
|
For non-PDF files that are images (jpg/png): base64 encode directly, skip pdftoppm.
|
|
For XLSX: use excelize GetRows on all sheets → format as markdown table → skip vision model entirely.
|
|
For other binary types: attempt pdftoppm, fall back to filename+extension as minimal context.
|
|
|
|
Function signature:
|
|
FileToImages(path string) ([]string, error) // returns base64-encoded JPEG strings
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Chunker
|
|
|
|
Create `internal/extract/chunker.go`:
|
|
|
|
```
|
|
ChunkMarkdown(text string) []string
|
|
- Split on markdown headings (## or ###) first
|
|
- If a section > 600 tokens (approx 2400 chars): split further at paragraph breaks (\n\n)
|
|
- If a paragraph > 600 tokens: split at sentence boundary (". ")
|
|
- Overlap: prepend last 80 chars of previous chunk to each chunk (context continuity)
|
|
- Minimum chunk length: 50 chars (discard shorter)
|
|
- Return []string of chunks
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Extraction Worker
|
|
|
|
Create `internal/worker/extractor.go`:
|
|
|
|
```
|
|
type ExtractionJob struct {
|
|
ResponseID string
|
|
FilePath string // absolute path to uploaded file (or "" for statements)
|
|
DealID string
|
|
}
|
|
|
|
type Extractor struct {
|
|
db *sql.DB
|
|
fw *fireworks.Client
|
|
jobs chan ExtractionJob
|
|
}
|
|
|
|
func NewExtractor(db *sql.DB, fw *fireworks.Client) *Extractor
|
|
func (e *Extractor) Start() // launch 2 worker goroutines
|
|
func (e *Extractor) Enqueue(job ExtractionJob)
|
|
|
|
Worker loop:
|
|
1. Set responses.extraction_status = 'processing'
|
|
2. If file:
|
|
a. Convert to images (extract.FileToImages)
|
|
b. Call fw.ExtractToMarkdown → markdown body
|
|
c. UPDATE responses SET body=?, extraction_status='done'
|
|
3. If statement (body already set, skip extraction):
|
|
a. extraction_status → 'done' immediately
|
|
4. Chunk: extract.ChunkMarkdown(body)
|
|
5. Embed: fw.EmbedText(chunks) → [][]float32
|
|
6. Store each chunk: INSERT INTO response_chunks (id, response_id, chunk_index, text, vector)
|
|
- Serialise []float32 as little-endian bytes: each float32 = 4 bytes
|
|
7. Match against all open requests in this deal:
|
|
a. Load all diligence_requests for deal_id (embed their descriptions if not already embedded)
|
|
b. Embed request descriptions that have no embedding yet (store in a simple in-memory cache or re-embed each run — re-embed is fine for now)
|
|
c. For each (chunk, request) pair: compute cosine similarity
|
|
d. If similarity >= 0.72: INSERT OR IGNORE INTO request_links (request_id, response_id, chunk_id, confidence, auto_linked=1, confirmed=0)
|
|
8. Log summary: "Response {id}: {N} chunks, {M} request links auto-created"
|
|
On error: SET extraction_status = 'failed', log error
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Handler: Responses & Assignment Rules
|
|
|
|
Create `internal/handler/responses.go`:
|
|
|
|
```
|
|
Handlers:
|
|
|
|
POST /deals/responses/statement
|
|
- Fields: deal_id, title, body (markdown text)
|
|
- Create responses row (type='statement', extraction_status='pending')
|
|
- Enqueue extraction job (body already set, worker will chunk+embed+match)
|
|
- Redirect to /deals/{deal_id}?tab=requests
|
|
|
|
POST /deals/responses/confirm
|
|
- Fields: request_id, response_id, chunk_id
|
|
- UPDATE request_links SET confirmed=1, confirmed_by=profile.ID, confirmed_at=now
|
|
- Return 200 OK (HTMX partial or redirect)
|
|
|
|
POST /deals/responses/reject
|
|
- Fields: request_id, response_id, chunk_id
|
|
- DELETE FROM request_links WHERE ...
|
|
- Return 200 OK
|
|
|
|
GET /deals/responses/pending/{dealID}
|
|
- Returns all request_links WHERE confirmed=0 AND auto_linked=1
|
|
- Joined with requests (description) and responses (title, type)
|
|
- Returns JSON for HTMX partial
|
|
|
|
POST /deals/assignment-rules/save
|
|
- Fields: deal_id, rules[] (keyword + assignee_id pairs, JSON array)
|
|
- DELETE existing rules for deal, INSERT new set
|
|
- On save: re-run auto-assignment for all unassigned requests in deal
|
|
- Redirect back to deal settings
|
|
|
|
GET /deals/assignment-rules/{dealID}
|
|
- Returns JSON array of {id, keyword, assignee_id, assignee_name}
|
|
|
|
Auto-assignment function (call on: rule save, request import):
|
|
func autoAssignRequests(db, dealID):
|
|
- Load all assignment_rules for deal_id
|
|
- For each diligence_request WHERE assignee_id = '':
|
|
- Check if section contains any rule keyword (case-insensitive)
|
|
- If match: UPDATE diligence_requests SET assignee_id = rule.assignee_id
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Wire Up in handler.go
|
|
|
|
Add to RegisterRoutes:
|
|
```go
|
|
// Responses & AI matching
|
|
mux.HandleFunc("/deals/responses/statement", h.requireAuth(h.handleCreateStatement))
|
|
mux.HandleFunc("/deals/responses/confirm", h.requireAuth(h.handleConfirmLink))
|
|
mux.HandleFunc("/deals/responses/reject", h.requireAuth(h.handleRejectLink))
|
|
mux.HandleFunc("/deals/responses/pending/", h.requireAuth(h.handlePendingLinks))
|
|
mux.HandleFunc("/deals/assignment-rules/save", h.requireAuth(h.handleSaveAssignmentRules))
|
|
mux.HandleFunc("/deals/assignment-rules/", h.requireAuth(h.handleGetAssignmentRules))
|
|
```
|
|
|
|
In Handler struct, add:
|
|
```go
|
|
extractor *worker.Extractor
|
|
fw *fireworks.Client
|
|
```
|
|
|
|
In New(): initialise both, call extractor.Start().
|
|
|
|
In handleFileUpload (files.go): after saving file, create a responses row (type='document') and enqueue extraction job.
|
|
|
|
---
|
|
|
|
## 8. Template Changes
|
|
|
|
### dealroom.templ — Requests tab
|
|
|
|
Current requests tab shows a list of requests. Add:
|
|
|
|
**A) Per-request: assignee + status badge**
|
|
- Show assignee name (or "Unassigned" in gray) next to each request
|
|
- Status pill: open (gray), in_progress (blue), answered (green), not_applicable (muted)
|
|
- If confirmed link exists: show "✓ Answered" with link to the response
|
|
- If pending auto-links exist: show "🤖 N AI matches — review" button (teal outline)
|
|
|
|
**B) Pending AI matches panel** (shown above request list if any pending)
|
|
- Collapsible section: "🤖 X AI-suggested matches waiting for review"
|
|
- Each row: Request description | → | Response title | Confidence % | [Confirm] [Reject]
|
|
- Confirm/Reject use fetch() POST to /deals/responses/confirm or /reject, then reload
|
|
|
|
**C) "Add Statement" button** (in requests toolbar)
|
|
- Opens a modal: Title + markdown textarea
|
|
- Submits to POST /deals/responses/statement
|
|
- After submit: shows in pending matches if AI matched any requests
|
|
|
|
**D) Assignment rules** (accessible via a gear icon or "Settings" in requests tab header)
|
|
- Inline expandable panel or small modal
|
|
- Table: Keyword | Assignee (dropdown of internal team members) | [Remove]
|
|
- [Add Rule] row at bottom
|
|
- Save button → POST /deals/assignment-rules/save
|
|
|
|
### Keep it clean
|
|
- Don't clutter the existing request rows — use progressive disclosure
|
|
- The "N AI matches" prompt should be prominent but not alarming
|
|
- Confidence shown as percentage (e.g. "87%"), not raw float
|
|
|
|
---
|
|
|
|
## 9. Files tab: extraction status
|
|
|
|
In the files table, add a small status indicator per file:
|
|
- ⏳ Extracting... (extraction_status = 'pending' or 'processing')
|
|
- ✓ (extraction_status = 'done') — subtle, no noise
|
|
- ⚠ (extraction_status = 'failed') — show tooltip with reason
|
|
|
|
Poll via a simple setInterval (every 5s) that reloads the file list if any files are pending — only while any files are pending, stop polling once all done.
|
|
|
|
---
|
|
|
|
## 10. Build & Deploy
|
|
|
|
After all code changes:
|
|
1. Run: cd ~/dev/dealroom && PATH=$PATH:/home/johan/go/bin:/usr/local/go/bin make build
|
|
2. Run: systemctl --user stop dealroom && cp bin/dealroom dealroom && systemctl --user start dealroom
|
|
3. Verify: curl -s -o /dev/null -w "%{http_code}" http://localhost:9300/ (expect 303)
|
|
4. Check logs: journalctl --user -u dealroom -n 30 --no-pager
|
|
5. Run: cd ~/dev/dealroom && git add -A && git commit -m "feat: responses, AI matching, assignment rules" && git push origin main
|
|
|
|
---
|
|
|
|
## Key Constants
|
|
|
|
Fireworks API key: fw_RVcDe4c6mN4utKLsgA7hTm
|
|
Extraction model: accounts/fireworks/models/llama-v3p2-90b-vision-instruct
|
|
Embedding model: nomic-ai/nomic-embed-text-v1.5
|
|
Match threshold: 0.72 cosine similarity
|
|
Chunk size: ~400 tokens / ~1600 chars max
|
|
Chunk overlap: ~80 chars
|
|
Max images per vision call: 10
|
|
Worker concurrency: 2 goroutines
|
|
|
|
Files are stored at: data/uploads/ (relative to WorkingDirectory in the service)
|
|
DB path: data/db/dealroom.db
|