12 KiB

Raw Blame History

Feature Spec: Responses, AI Matching, Assignment Rules

Context

Dealspace needs to separate what buyers ask (requests) from what sellers provide (responses), with AI automatically discovering which responses satisfy which requests via embeddings. Confirmed by a human before being counted as answered.

Locked Decisions

Assignment rules: per deal
Statements (typed text answers): IN SCOPE
Extraction: async background worker
Confirmation: internal users only (RBAC refinement later)

1. Schema Changes

New tables (add to migrate.go as CREATE TABLE IF NOT EXISTS in the migrations slice)

-- Responses: seller-provided answers (document OR typed statement)
CREATE TABLE IF NOT EXISTS responses (
    id              TEXT PRIMARY KEY,
    deal_id         TEXT NOT NULL,
    type            TEXT NOT NULL CHECK (type IN ('document','statement')),
    title           TEXT NOT NULL,
    body            TEXT DEFAULT '',        -- markdown: extracted doc content OR typed text
    file_id         TEXT DEFAULT '',        -- populated for type='document'
    extraction_status TEXT DEFAULT 'pending'
        CHECK (extraction_status IN ('pending','processing','done','failed')),
    created_by      TEXT DEFAULT '',
    created_at      DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at      DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (deal_id) REFERENCES deals(id)
);

-- Chunks: segments of a response body for fine-grained matching
CREATE TABLE IF NOT EXISTS response_chunks (
    id           TEXT PRIMARY KEY,
    response_id  TEXT NOT NULL,
    chunk_index  INTEGER NOT NULL,
    text         TEXT NOT NULL,
    vector       BLOB NOT NULL,     -- []float32 serialised as little-endian bytes
    FOREIGN KEY (response_id) REFERENCES responses(id)
);

-- N:M: AI-discovered links between requests and response chunks
CREATE TABLE IF NOT EXISTS request_links (
    request_id   TEXT NOT NULL,
    response_id  TEXT NOT NULL,
    chunk_id     TEXT NOT NULL,
    confidence   REAL NOT NULL,         -- cosine similarity 0-1
    auto_linked  BOOLEAN DEFAULT 1,
    confirmed    BOOLEAN DEFAULT 0,
    confirmed_by TEXT DEFAULT '',
    confirmed_at DATETIME,
    PRIMARY KEY (request_id, response_id, chunk_id)
);

-- Assignment rules: keyword → assignee, per deal
CREATE TABLE IF NOT EXISTS assignment_rules (
    id           TEXT PRIMARY KEY,
    deal_id      TEXT NOT NULL,
    keyword      TEXT NOT NULL,         -- e.g. "Legal", "Tax", "HR"
    assignee_id  TEXT NOT NULL,         -- profile ID
    created_at   DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (deal_id) REFERENCES deals(id)
);

Additive migrations (append to additiveMigrationStmts in migrate.go)

`ALTER TABLE diligence_requests ADD COLUMN assignee_id TEXT DEFAULT ''`,
`ALTER TABLE diligence_requests ADD COLUMN status TEXT DEFAULT 'open'`,
`ALTER TABLE files ADD COLUMN response_id TEXT DEFAULT ''`,

Note: status CHECK constraint can't be added via ALTER TABLE in SQLite — enforce in handler.

2. Fireworks Client

Create internal/fireworks/client.go:

Package: fireworks

Fireworks API key: fw_RVcDe4c6mN4utKLsgA7hTm
Base URL: https://api.fireworks.ai/inference/v1

Functions needed:

1. ExtractToMarkdown(ctx, imageBase64 []string, filename string) (string, error)
   - Model: accounts/fireworks/models/llama-v3p2-90b-vision-instruct
   - System prompt: "You are a document extraction expert. Extract ALL content from this document into clean markdown. Preserve headings, tables, lists, and structure. Do not summarise — extract everything."
   - Send up to 10 images per call (multi-page docs: batch into 10-page chunks, concatenate results)
   - For XLSX files (no images): use a different path — just send the structured data as text
   - Return full markdown string

2. EmbedText(ctx, texts []string) ([][]float32, error)
   - Model: nomic-ai/nomic-embed-text-v1.5
   - POST /embeddings (OpenAI-compatible)
   - Batch up to 50 texts per call
   - Return [][]float32

3. CosineSimilarity(a, b []float32) float32
   - Pure Go dot product (normalised vectors)

3. PDF-to-Images Conversion

Create internal/extract/pdf.go:

Use exec("pdftoppm") subprocess:
  pdftoppm -jpeg -r 150 input.pdf /tmp/prefix
  → produces /tmp/prefix-1.jpg, /tmp/prefix-2.jpg, ...

Read each JPEG → base64 encode → pass to fireworks.ExtractToMarkdown

For non-PDF files that are images (jpg/png): base64 encode directly, skip pdftoppm.
For XLSX: use excelize GetRows on all sheets → format as markdown table → skip vision model entirely.
For other binary types: attempt pdftoppm, fall back to filename+extension as minimal context.

Function signature:
  FileToImages(path string) ([]string, error)  // returns base64-encoded JPEG strings

4. Chunker

Create internal/extract/chunker.go:

ChunkMarkdown(text string) []string
  - Split on markdown headings (## or ###) first
  - If a section > 600 tokens (approx 2400 chars): split further at paragraph breaks (\n\n)
  - If a paragraph > 600 tokens: split at sentence boundary (". ")
  - Overlap: prepend last 80 chars of previous chunk to each chunk (context continuity)
  - Minimum chunk length: 50 chars (discard shorter)
  - Return []string of chunks

5. Extraction Worker

Create internal/worker/extractor.go:

type ExtractionJob struct {
    ResponseID string
    FilePath   string   // absolute path to uploaded file (or "" for statements)
    DealID     string
}

type Extractor struct {
    db       *sql.DB
    fw       *fireworks.Client
    jobs     chan ExtractionJob
}

func NewExtractor(db *sql.DB, fw *fireworks.Client) *Extractor
func (e *Extractor) Start()  // launch 2 worker goroutines
func (e *Extractor) Enqueue(job ExtractionJob)

Worker loop:
1. Set responses.extraction_status = 'processing'
2. If file:
   a. Convert to images (extract.FileToImages)
   b. Call fw.ExtractToMarkdown → markdown body
   c. UPDATE responses SET body=?, extraction_status='done'
3. If statement (body already set, skip extraction):
   a. extraction_status → 'done' immediately
4. Chunk: extract.ChunkMarkdown(body)
5. Embed: fw.EmbedText(chunks) → [][]float32
6. Store each chunk: INSERT INTO response_chunks (id, response_id, chunk_index, text, vector)
   - Serialise []float32 as little-endian bytes: each float32 = 4 bytes
7. Match against all open requests in this deal:
   a. Load all diligence_requests for deal_id (embed their descriptions if not already embedded)
   b. Embed request descriptions that have no embedding yet (store in a simple in-memory cache or re-embed each run — re-embed is fine for now)
   c. For each (chunk, request) pair: compute cosine similarity
   d. If similarity >= 0.72: INSERT OR IGNORE INTO request_links (request_id, response_id, chunk_id, confidence, auto_linked=1, confirmed=0)
8. Log summary: "Response {id}: {N} chunks, {M} request links auto-created"
On error: SET extraction_status = 'failed', log error

6. Handler: Responses & Assignment Rules

Create internal/handler/responses.go:

Handlers:

POST /deals/responses/statement
  - Fields: deal_id, title, body (markdown text)
  - Create responses row (type='statement', extraction_status='pending')
  - Enqueue extraction job (body already set, worker will chunk+embed+match)
  - Redirect to /deals/{deal_id}?tab=requests

POST /deals/responses/confirm
  - Fields: request_id, response_id, chunk_id
  - UPDATE request_links SET confirmed=1, confirmed_by=profile.ID, confirmed_at=now
  - Return 200 OK (HTMX partial or redirect)

POST /deals/responses/reject
  - Fields: request_id, response_id, chunk_id
  - DELETE FROM request_links WHERE ...
  - Return 200 OK

GET /deals/responses/pending/{dealID}
  - Returns all request_links WHERE confirmed=0 AND auto_linked=1
  - Joined with requests (description) and responses (title, type)
  - Returns JSON for HTMX partial

POST /deals/assignment-rules/save
  - Fields: deal_id, rules[] (keyword + assignee_id pairs, JSON array)
  - DELETE existing rules for deal, INSERT new set
  - On save: re-run auto-assignment for all unassigned requests in deal
  - Redirect back to deal settings

GET /deals/assignment-rules/{dealID}
  - Returns JSON array of {id, keyword, assignee_id, assignee_name}

Auto-assignment function (call on: rule save, request import):
  func autoAssignRequests(db, dealID):
    - Load all assignment_rules for deal_id
    - For each diligence_request WHERE assignee_id = '':
      - Check if section contains any rule keyword (case-insensitive)
      - If match: UPDATE diligence_requests SET assignee_id = rule.assignee_id

7. Wire Up in handler.go

Add to RegisterRoutes:

// Responses & AI matching
mux.HandleFunc("/deals/responses/statement", h.requireAuth(h.handleCreateStatement))
mux.HandleFunc("/deals/responses/confirm", h.requireAuth(h.handleConfirmLink))
mux.HandleFunc("/deals/responses/reject", h.requireAuth(h.handleRejectLink))
mux.HandleFunc("/deals/responses/pending/", h.requireAuth(h.handlePendingLinks))
mux.HandleFunc("/deals/assignment-rules/save", h.requireAuth(h.handleSaveAssignmentRules))
mux.HandleFunc("/deals/assignment-rules/", h.requireAuth(h.handleGetAssignmentRules))

In Handler struct, add:

extractor *worker.Extractor
fw        *fireworks.Client

In New(): initialise both, call extractor.Start().

In handleFileUpload (files.go): after saving file, create a responses row (type='document') and enqueue extraction job.

8. Template Changes

dealroom.templ — Requests tab

Current requests tab shows a list of requests. Add:

A) Per-request: assignee + status badge

Show assignee name (or "Unassigned" in gray) next to each request
Status pill: open (gray), in_progress (blue), answered (green), not_applicable (muted)
If confirmed link exists: show "✓ Answered" with link to the response
If pending auto-links exist: show "🤖 N AI matches — review" button (teal outline)

B) Pending AI matches panel (shown above request list if any pending)

Collapsible section: "🤖 X AI-suggested matches waiting for review"
Each row: Request description | → | Response title | Confidence % | [Confirm] [Reject]
Confirm/Reject use fetch() POST to /deals/responses/confirm or /reject, then reload

C) "Add Statement" button (in requests toolbar)

Opens a modal: Title + markdown textarea
Submits to POST /deals/responses/statement
After submit: shows in pending matches if AI matched any requests

D) Assignment rules (accessible via a gear icon or "Settings" in requests tab header)

Inline expandable panel or small modal
Table: Keyword | Assignee (dropdown of internal team members) | [Remove]
[Add Rule] row at bottom
Save button → POST /deals/assignment-rules/save

Keep it clean

Don't clutter the existing request rows — use progressive disclosure
The "N AI matches" prompt should be prominent but not alarming
Confidence shown as percentage (e.g. "87%"), not raw float

9. Files tab: extraction status

In the files table, add a small status indicator per file:

⏳ Extracting... (extraction_status = 'pending' or 'processing')
✓ (extraction_status = 'done') — subtle, no noise
⚠ (extraction_status = 'failed') — show tooltip with reason

Poll via a simple setInterval (every 5s) that reloads the file list if any files are pending — only while any files are pending, stop polling once all done.

10. Build & Deploy

After all code changes:

Run: cd ~/dev/dealroom && PATH=$PATH:/home/johan/go/bin:/usr/local/go/bin make build
Run: systemctl --user stop dealroom && cp bin/dealroom dealroom && systemctl --user start dealroom
Verify: curl -s -o /dev/null -w "%{http_code}" http://localhost:9300/ (expect 303)
Check logs: journalctl --user -u dealroom -n 30 --no-pager
Run: cd ~/dev/dealroom && git add -A && git commit -m "feat: responses, AI matching, assignment rules" && git push origin main

Key Constants

Fireworks API key: fw_RVcDe4c6mN4utKLsgA7hTm Extraction model: accounts/fireworks/models/llama-v3p2-90b-vision-instruct Embedding model: nomic-ai/nomic-embed-text-v1.5 Match threshold: 0.72 cosine similarity Chunk size: ~400 tokens / ~1600 chars max Chunk overlap: ~80 chars Max images per vision call: 10 Worker concurrency: 2 goroutines

Files are stored at: data/uploads/ (relative to WorkingDirectory in the service) DB path: data/db/dealroom.db

12 KiB Raw Blame History