# Dealspace — AI Matching & Embedding Specification

**Version:** 0.1 — 2026-02-28  
**Status:** Pre-implementation. Addresses SPEC-REVIEW.md section 3 (race conditions) and section 7.4 (magic threshold).

---

## 1. Embedding Model Selection

### 1.1 Candidates

| Model | Provider | Dimensions | Context | Latency | Cost | Retention |
|-------|----------|------------|---------|---------|------|-----------|
| `nomic-embed-text-v1.5` | Fireworks | 768 | 8192 | ~50ms | $0.008/1M | Zero |
| `voyage-finance-2` | Voyage AI | 1024 | 16000 | ~80ms | $0.12/1M | 30 days |

### 1.2 Domain Analysis

M&A requests contain:
- **Financial terminology:** EBITDA, working capital adjustments, earnout provisions, rep & warranty
- **Legal terminology:** indemnification, IP assignments, material adverse change, disclosure schedules
- **Industry-specific terms:** varies by deal (tech: ARR, churn; healthcare: HIPAA, 340B; manufacturing: capex, inventory turns)

**Voyage-finance-2** was trained specifically on financial documents (10-Ks, credit agreements, M&A filings). It shows ~8% improvement on financial similarity benchmarks vs. general-purpose models.

**Nomic-embed-text-v1.5** is general-purpose but performs well on semantic matching. Zero retention is critical for M&A confidentiality.

### 1.3 Recommendation

**Use Fireworks nomic-embed-text-v1.5** for MVP:
1. **Zero retention** — Voyage's 30-day retention is unacceptable for M&A data
2. **15x cheaper** — allows generous matching without cost concerns
3. **Proven stack** — same infrastructure as inou (known good)
4. **Good enough** — general semantic similarity works for request matching; we're matching "audited financials" to "FY2024 audit report," not parsing covenant calculations

**Revisit voyage-finance-2 when:**
- Voyage offers a zero-retention API option
- Match quality metrics show <85% human confirmation rate
- We expand to quantitative matching (finding answers by numerical similarity)

### 1.4 Fireworks Configuration

```go
const (
    EmbedModel     = "nomic-embed-text-v1.5"
    EmbedEndpoint  = "https://api.fireworks.ai/inference/v1/embeddings"
    EmbedDimension = 768
    EmbedMaxTokens = 8192  // model context limit
)
```

API key stored in environment: `FIREWORKS_API_KEY`

---

## 2. What to Embed

### 2.1 Request Embedding

Embed the **semantic intent**, not raw fields.

```go
func BuildRequestEmbedText(r *Request, ws *Workstream) string {
    var b strings.Builder
    
    // Workstream context (aids cross-workstream relevance)
    b.WriteString("Workstream: ")
    b.WriteString(ws.Name)
    b.WriteString("\n\n")
    
    // Title is high-signal
    b.WriteString("Request: ")
    b.WriteString(r.Title)
    b.WriteString("\n\n")
    
    // Body provides detail
    if r.Body != "" {
        b.WriteString("Details: ")
        b.WriteString(r.Body)
    }
    
    return strings.TrimSpace(b.String())
}
```

**Do NOT embed:**
- Request ID, ref numbers (non-semantic)
- Due dates, priority (operational, not semantic)
- Assignee names (PII, not relevant to matching)
- Status (changes frequently, embedding is point-in-time)

### 2.2 Answer Embedding

Answers may be long (multi-document explanations). Use chunking with overlap.

```go
const (
    AnswerChunkSize    = 1500  // tokens (~6000 chars)
    AnswerChunkOverlap = 200   // tokens overlap for context continuity
)

func BuildAnswerEmbedTexts(a *Answer, ws *Workstream) []string {
    var chunks []string
    
    // Prefix every chunk with context
    prefix := fmt.Sprintf("Workstream: %s\nAnswer: %s\n\n", ws.Name, a.Title)
    
    body := a.Body
    if len(body) <= AnswerChunkSize*4 { // ~6000 chars = 1 chunk
        return []string{prefix + body}
    }
    
    // Chunk long answers
    for start := 0; start < len(body); {
        end := start + AnswerChunkSize*4
        if end > len(body) {
            end = len(body)
        }
        
        // Find sentence boundary near end
        if end < len(body) {
            if idx := strings.LastIndex(body[start:end], ". "); idx > 0 {
                end = start + idx + 1
            }
        }
        
        chunks = append(chunks, prefix+body[start:end])
        start = end - AnswerChunkOverlap*4
        if start < 0 {
            start = 0
        }
    }
    
    return chunks
}
```

**Do NOT embed:**
- File contents (privacy: see section 10)
- File names (may contain sensitive identifiers)
- Rejection reasons (operational, not semantic)
- Internal comments (IB-only context)

### 2.3 Embedding Storage

```sql
CREATE TABLE embeddings (
    id         TEXT PRIMARY KEY,           -- UUID
    entry_id   TEXT NOT NULL,              -- request or answer entry_id
    chunk_idx  INTEGER NOT NULL DEFAULT 0, -- 0 for single-chunk, 0..N for multi-chunk
    vector     BLOB NOT NULL,              -- 768 float32 = 3072 bytes
    text_hash  TEXT NOT NULL,              -- SHA-256 of embedded text (dedup check)
    model      TEXT NOT NULL,              -- "nomic-embed-text-v1.5"
    created_at INTEGER NOT NULL,
    UNIQUE(entry_id, chunk_idx)
);

CREATE INDEX idx_embeddings_entry ON embeddings(entry_id);
```

**Note:** Vector stored as raw `float32` bytes (little-endian). Cosine similarity computed in Go, not SQL.

---

## 3. Retroactive Matching (Bidirectional)

### 3.1 The Problem (from SPEC-REVIEW §3.1)

The original spec only describes matching when a buyer submits a request. But:
- **New request** should search existing published answers
- **New answer** (when published) should search open requests

Both directions are required for complete coverage.

### 3.2 Matching Directions

```
Direction 1: Request → Answers
  Trigger: Request created OR request text updated
  Search:  All published answers in accessible workstreams
  Output:  Suggested answer_links with ai_score

Direction 2: Answer → Requests  
  Trigger: Answer published (stage = dataroom)
  Search:  All open requests in accessible workstreams
  Output:  Suggested answer_links with ai_score
```

### 3.3 Implementation

```go
// Called when request is created or body/title changes
func MatchRequestToAnswers(ctx context.Context, requestID string) ([]AnswerMatch, error) {
    // 1. Get request embedding (create if missing)
    // 2. Load all published answer embeddings in same workstream
    // 3. Cosine similarity against each
    // 4. Return matches above threshold
}

// Called when answer is published
func MatchAnswerToRequests(ctx context.Context, answerID string) ([]RequestMatch, error) {
    // 1. Get answer embedding(s) (multi-chunk: use max score across chunks)
    // 2. Load all open request embeddings in same workstream
    // 3. Cosine similarity against each
    // 4. Return matches above threshold
}
```

### 3.4 Matching on Update

If a request body is edited:
1. Recompute embedding (check `text_hash` — skip if unchanged)
2. Re-run matching
3. New suggestions appear; existing confirmed links preserved

If an answer body is edited before publish:
- No action (draft state)

If an answer is re-published (correction):
- Re-run matching
- Flag for human review if new requests match

---

## 4. Broadcast Idempotency

### 4.1 The Problem (from SPEC-REVIEW §3.2)

Multiple requests can link to the same answer. Without idempotency:
- Confirming R1↔A1 sends broadcast to Buyer A
- Confirming R2↔A1 sends another broadcast to Buyer B
- If Buyer A also asked R2... they get two notifications

### 4.2 Broadcasts Table

```sql
CREATE TABLE broadcasts (
    id           TEXT PRIMARY KEY,
    answer_id    TEXT NOT NULL REFERENCES entries(entry_id),
    request_id   TEXT NOT NULL REFERENCES entries(entry_id),
    recipient_id TEXT NOT NULL REFERENCES users(id),
    channel      TEXT NOT NULL,       -- "web" | "email" | "slack" | "teams"
    sent_at      INTEGER NOT NULL,
    UNIQUE(answer_id, request_id, recipient_id, channel)
);

CREATE INDEX idx_broadcasts_answer ON broadcasts(answer_id);
CREATE INDEX idx_broadcasts_recipient ON broadcasts(recipient_id);
```

### 4.3 Broadcast Logic

```go
func BroadcastAnswer(ctx context.Context, tx *sql.Tx, answerID string) error {
    // 1. Get all confirmed answer_links for this answer
    links, err := getConfirmedLinks(tx, answerID)
    
    // 2. For each link, get the request's origin_id (ultimate requester)
    recipients := make(map[string][]string) // user_id -> []request_id
    for _, link := range links {
        req, _ := getRequest(tx, link.RequestID)
        recipients[req.OriginID] = append(recipients[req.OriginID], link.RequestID)
    }
    
    // 3. For each recipient, check idempotency and send
    for userID, requestIDs := range recipients {
        for _, reqID := range requestIDs {
            // Check if already sent
            exists, _ := broadcastExists(tx, answerID, reqID, userID, "web")
            if exists {
                continue // idempotent skip
            }
            
            // Record broadcast
            err := insertBroadcast(tx, answerID, reqID, userID, "web", time.Now().UnixMilli())
            if err != nil {
                return err
            }
            
            // Queue notification (outside transaction)
            NotifyQueue.Push(Notification{
                UserID:    userID,
                AnswerID:  answerID,
                RequestID: reqID,
            })
        }
    }
    
    return nil
}
```

### 4.4 Notification Deduplication

Even with idempotency per (answer, request, recipient), a user might get multiple notifications if they asked 5 equivalent questions.

**User-facing behavior:** Collapse into single notification:
```
"Answer published: [Title] — resolves 5 of your requests"
```

This is a presentation concern, not a data model change. The `broadcasts` table tracks each link; the notification renderer collapses them.

---

## 5. Configurable Similarity Threshold

### 5.1 The Problem (from SPEC-REVIEW §7.4)

Hardcoded 0.72 is a magic number that:
- May be too strict for some workstreams (legal requests are verbose)
- May be too loose for others (financial requests are terse)
- Cannot be tuned without code changes

### 5.2 Per-Workstream Configuration

Add to workstream entry's Data:

```json
{
  "name": "Finance",
  "match_config": {
    "threshold": 0.72,
    "auto_confirm_threshold": 0.95,
    "cross_workstream": ["Legal"]
  }
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `threshold` | float | 0.72 | Minimum score to suggest match |
| `auto_confirm_threshold` | float | null | If set, scores above this auto-confirm (no human review) |
| `cross_workstream` | []string | [] | Workstream slugs to include in matching (see section 6) |

### 5.3 Threshold Tuning Guidance

| Workstream Type | Recommended Threshold | Rationale |
|-----------------|----------------------|-----------|
| Finance | 0.72 | Standard M&A requests, well-defined terminology |
| Legal | 0.68 | Verbose requests with boilerplate, semantic core is smaller |
| IT | 0.75 | Technical specificity matters, false positives costly |
| HR | 0.70 | Mix of standard and org-specific terms |
| Operations | 0.72 | General business terminology |

### 5.4 Calibration Process

After initial deal data:
1. Export all (request, answer, human_confirmed) tuples
2. Compute score distribution for confirmed vs. rejected matches
3. Adjust threshold to maximize F1 score per workstream
4. Log threshold changes to audit for compliance

---

## 6. Cross-Workstream Matching

### 6.1 Use Case

An IT request ("Describe cybersecurity insurance coverage") may be answered by a Legal answer (cyber liability policy document).

Without cross-workstream matching, the IT buyer never sees the Legal answer.

### 6.2 Opt-In Per Workstream Pair

Configured in each workstream's `match_config.cross_workstream`:

```json
// IT workstream
{
  "match_config": {
    "cross_workstream": ["Legal"]  // IT requests search Legal answers
  }
}

// Legal workstream  
{
  "match_config": {
    "cross_workstream": ["IT", "Finance"]  // Legal requests search IT and Finance
  }
}
```

**Relationship is directional:** IT searching Legal doesn't imply Legal searches IT.

### 6.3 RBAC Interaction

Cross-workstream matching only returns answers the requester can access:

```go
func GetMatchableAnswers(ctx context.Context, actorID, requestWorkstreamID string) ([]Answer, error) {
    // 1. Get workstream config
    ws, _ := getWorkstream(requestWorkstreamID)
    
    // 2. Build workstream list (self + cross)
    workstreams := []string{ws.ID}
    workstreams = append(workstreams, ws.MatchConfig.CrossWorkstream...)
    
    // 3. Filter by access (RBAC)
    var accessible []string
    for _, wsID := range workstreams {
        if hasAccess(actorID, wsID, "read") {
            accessible = append(accessible, wsID)
        }
    }
    
    // 4. Get published answers from accessible workstreams
    return getPublishedAnswers(accessible)
}
```

---

## 7. Request Deduplication (Auto-Suggest Existing Answers)

### 7.1 The Problem

Buyer B asks the same question Buyer A already got answered. Without dedup:
- Seller does duplicate work
- IB reviews duplicate request
- Buyer B waits when answer already exists

### 7.2 Dedup Flow

```
Buyer B submits request
  → Embed request
  → Search published answers (same logic as section 3)
  → If match score ≥ threshold:
      → Show Buyer B: "Similar answer already exists — view it?"
      → If Buyer B accepts: link request to existing answer, mark resolved
      → If Buyer B declines: proceed with normal request flow
```

### 7.3 UX Considerations

**Don't block submission.** Show suggestion after submit, not as a gate:

```
┌─────────────────────────────────────────────────────────────┐
│ Your request has been submitted.                            │
│                                                              │
│ 💡 We found a similar published answer that may help:        │
│                                                              │
│    "FY2024 Audited Financial Statements"                    │
│    Published: 2026-02-15 | Similarity: 89%                  │
│                                                              │
│    [View Answer]  [This doesn't answer my question]         │
└─────────────────────────────────────────────────────────────┘
```

### 7.4 Data Model

When buyer accepts the suggestion:

```sql
INSERT INTO answer_links (answer_id, request_id, linked_by, linked_at, status, ai_score)
VALUES (?, ?, ?, ?, 'self_confirmed', ?);
-- status = 'self_confirmed' means buyer accepted the AI suggestion
-- no IB review required
```

When buyer declines:

```sql
INSERT INTO answer_links (answer_id, request_id, linked_by, linked_at, status, ai_score)
VALUES (?, ?, ?, ?, 'rejected_by_requester', ?);
-- Prevents suggesting this answer again for this request
-- Request proceeds to normal IB/Seller workflow
```

---

## 8. Race Condition Fixes (DB Transactions)

### 8.1 The Problem (from SPEC-REVIEW §3)

Without transactions:
1. IB confirms match R1↔A1
2. Concurrent: IB publishes A1
3. Broadcast fires during confirm
4. Confirm completes, tries to broadcast again
5. Duplicate notifications or worse — inconsistent state

### 8.2 Transaction Boundaries

**Atomic operation 1: Confirm Match**
```go
func ConfirmMatch(ctx context.Context, answerID, requestID, actorID string) error {
    return db.Transaction(func(tx *sql.Tx) error {
        // 1. Verify answer exists and is published
        answer, err := getAnswer(tx, answerID)
        if err != nil || answer.Status != "published" {
            return ErrAnswerNotPublished
        }
        
        // 2. Update answer_link status
        err = updateAnswerLink(tx, answerID, requestID, "confirmed", actorID)
        if err != nil {
            return err
        }
        
        // 3. Broadcast (idempotent)
        err = BroadcastAnswer(ctx, tx, answerID)
        if err != nil {
            return err
        }
        
        return nil
    })
}
```

**Atomic operation 2: Publish Answer**
```go
func PublishAnswer(ctx context.Context, answerID, actorID string) error {
    return db.Transaction(func(tx *sql.Tx) error {
        // 1. Update answer status
        err := updateAnswerStatus(tx, answerID, "published", actorID)
        if err != nil {
            return err
        }
        
        // 2. Update entry stage to dataroom
        err = updateEntryStage(tx, answerID, "dataroom")
        if err != nil {
            return err
        }
        
        // 3. Run retroactive matching (creates pending answer_links)
        matches, err := MatchAnswerToRequests(ctx, answerID)
        if err != nil {
            return err
        }
        
        for _, m := range matches {
            err = insertAnswerLink(tx, answerID, m.RequestID, "pending", m.Score)
            if err != nil {
                return err
            }
        }
        
        // 4. Broadcast already-confirmed links (if any pre-existed)
        err = BroadcastAnswer(ctx, tx, answerID)
        if err != nil {
            return err
        }
        
        return nil
    })
}
```

### 8.3 Optimistic Locking

Add version column to prevent concurrent modifications:

```sql
ALTER TABLE entries ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
```

```go
func updateAnswerStatus(tx *sql.Tx, answerID, status string, expectedVersion int) (int, error) {
    result, err := tx.Exec(`
        UPDATE entries 
        SET data = json_set(data, '$.status', ?),
            version = version + 1,
            updated_at = ?
        WHERE entry_id = ? AND version = ?
    `, status, time.Now().UnixMilli(), answerID, expectedVersion)
    
    if err != nil {
        return 0, err
    }
    
    rows, _ := result.RowsAffected()
    if rows == 0 {
        return 0, ErrConcurrentModification
    }
    
    return expectedVersion + 1, nil
}
```

---

## 9. SQLite Cosine Similarity & Qdrant Migration

### 9.1 Pure Go Cosine Similarity

SQLite doesn't have native vector operations. Compute in Go:

```go
// CosineSimilarity computes similarity between two vectors.
// Vectors must be same length. Returns value in [-1, 1].
func CosineSimilarity(a, b []float32) float32 {
    if len(a) != len(b) {
        panic("vector length mismatch")
    }
    
    var dotProduct, normA, normB float32
    for i := range a {
        dotProduct += a[i] * b[i]
        normA += a[i] * a[i]
        normB += b[i] * b[i]
    }
    
    if normA == 0 || normB == 0 {
        return 0
    }
    
    return dotProduct / (float32(math.Sqrt(float64(normA))) * float32(math.Sqrt(float64(normB))))
}

// BatchCosineSimilarity computes query vs all candidates.
// Uses SIMD via Go compiler optimizations.
func BatchCosineSimilarity(query []float32, candidates [][]float32) []float32 {
    scores := make([]float32, len(candidates))
    
    // Pre-compute query norm
    var queryNorm float32
    for _, v := range query {
        queryNorm += v * v
    }
    queryNorm = float32(math.Sqrt(float64(queryNorm)))
    
    for i, candidate := range candidates {
        var dot, candNorm float32
        for j := range query {
            dot += query[j] * candidate[j]
            candNorm += candidate[j] * candidate[j]
        }
        candNorm = float32(math.Sqrt(float64(candNorm)))
        
        if queryNorm == 0 || candNorm == 0 {
            scores[i] = 0
        } else {
            scores[i] = dot / (queryNorm * candNorm)
        }
    }
    
    return scores
}
```

### 9.2 Performance Characteristics (SQLite + Go)

| Embeddings | Load Time | Search Time | Memory |
|------------|-----------|-------------|--------|
| 1,000 | 50ms | 2ms | 3 MB |
| 10,000 | 500ms | 20ms | 30 MB |
| 100,000 | 5s | 200ms | 300 MB |
| 1,000,000 | 50s | 2s | 3 GB |

**Acceptable for MVP:** Most deals have <10,000 documents. Search under 100ms is fine.

### 9.3 Qdrant Migration Threshold

Migrate to Qdrant when:
1. **Embedding count > 100,000** — search latency exceeds 200ms
2. **Memory pressure** — embeddings consume >500MB RAM
3. **Multi-tenancy** — need isolated collections per client (compliance)

### 9.4 Qdrant Integration (Future)

```go
type VectorStore interface {
    Upsert(id string, vector []float32, metadata map[string]any) error
    Search(query []float32, filter map[string]any, limit int) ([]SearchResult, error)
    Delete(id string) error
}

// SQLiteVectorStore implements VectorStore using embeddings table
type SQLiteVectorStore struct { ... }

// QdrantVectorStore implements VectorStore using Qdrant API
type QdrantVectorStore struct { ... }
```

Abstract behind interface now; swap implementation later without code changes.

### 9.5 Hybrid Mode (Transition)

During migration:
1. Write to both SQLite and Qdrant
2. Read from Qdrant (with SQLite fallback)
3. Validate results match for first 1000 queries
4. Drop SQLite embeddings table after validation

---

## 10. Privacy: Plaintext Only, Never Files

### 10.1 Embedding Content Policy

**ALLOWED to embed:**
- Request title
- Request body text
- Answer title  
- Answer body text (the explanation, not file contents)
- Workstream name (context)

**NEVER embed:**
- File contents (PDF, DOCX, XLSX, images)
- File names (may contain deal names, party names)
- Internal comments
- Routing/assignment metadata
- User names or email addresses

### 10.2 Why No File Embedding?

1. **Privacy:** M&A documents contain material non-public information. Sending to ANY external API (even zero-retention) creates compliance risk.

2. **Size:** A single PDF may be 100+ pages. Embedding would require chunking, storage, and search across potentially millions of chunks. Overkill for request-matching.

3. **Semantic mismatch:** Request asks "audited financials for FY2024." The answer body says "Please find attached the FY2024 audited financial statements." The body text + title is sufficient for matching — we don't need to embed page 47 of the PDF.

### 10.3 Future: On-Premise OCR + Embedding

If file-level search becomes required:
1. Run OCR on-premise (GLM-OCR on forge, not external API)
2. Store extracted text in `entry.data` (encrypted at rest)
3. Embed extracted text (still goes to Fireworks, but it's our extracted text, not raw file)

This is out of scope for MVP.

### 10.4 Audit Trail

Log every embedding request for compliance:

```sql
CREATE TABLE embed_audit (
    id         TEXT PRIMARY KEY,
    entry_id   TEXT NOT NULL,
    text_hash  TEXT NOT NULL,     -- SHA-256 of text sent
    text_len   INTEGER NOT NULL,  -- character count
    model      TEXT NOT NULL,
    requested_at INTEGER NOT NULL,
    latency_ms INTEGER,
    success    INTEGER NOT NULL
);
```

**Do NOT log the actual text** — that defeats the privacy purpose. Log the hash for correlation if needed.

---

## 11. lib/embed.go — Function Signatures

### 11.1 Public API

```go
package lib

import (
    "context"
)

// EmbedConfig holds embedding service configuration.
type EmbedConfig struct {
    APIKey      string  // FIREWORKS_API_KEY
    Endpoint    string  // defaults to Fireworks endpoint
    Model       string  // defaults to nomic-embed-text-v1.5
    Timeout     time.Duration
    MaxRetries  int
}

// EmbedResult contains the embedding and metadata.
type EmbedResult struct {
    Vector    []float32
    TextHash  string  // SHA-256 of input text
    Model     string
    TokenCount int
    LatencyMs int64
}

// MatchResult represents a potential match with score.
type MatchResult struct {
    EntryID   string
    ChunkIdx  int
    Score     float32
    EntryType string  // "request" | "answer"
}

// Embed generates an embedding for the given text.
// Returns ErrTextTooLong if text exceeds model context.
// Returns ErrEmptyText if text is empty or whitespace only.
func Embed(ctx context.Context, cfg *EmbedConfig, text string) (*EmbedResult, error)

// EmbedBatch generates embeddings for multiple texts.
// More efficient than calling Embed in a loop (single API call).
// Max 100 texts per batch.
func EmbedBatch(ctx context.Context, cfg *EmbedConfig, texts []string) ([]*EmbedResult, error)

// EmbedRequest creates and stores embedding for a request entry.
// Idempotent: skips if embedding exists and text_hash matches.
func EmbedRequest(ctx context.Context, db *sql.DB, cfg *EmbedConfig, requestID string) error

// EmbedAnswer creates and stores embedding(s) for an answer entry.
// May produce multiple chunks for long answers.
// Idempotent: skips chunks where text_hash matches.
func EmbedAnswer(ctx context.Context, db *sql.DB, cfg *EmbedConfig, answerID string) error

// MatchRequestToAnswers finds published answers matching the request.
// Returns matches above the workstream's configured threshold.
// Respects cross-workstream config and RBAC.
func MatchRequestToAnswers(ctx context.Context, db *sql.DB, actorID, requestID string) ([]MatchResult, error)

// MatchAnswerToRequests finds open requests matching the answer.
// Returns matches above the workstream's configured threshold.
// Respects cross-workstream config and RBAC.
func MatchAnswerToRequests(ctx context.Context, db *sql.DB, actorID, answerID string) ([]MatchResult, error)

// FindDuplicateRequests finds existing requests similar to the given text.
// Used for deduplication suggestions before/after submission.
func FindDuplicateRequests(ctx context.Context, db *sql.DB, actorID, workstreamID, text string) ([]MatchResult, error)

// CosineSimilarity computes similarity between two vectors.
func CosineSimilarity(a, b []float32) float32

// DeleteEmbeddings removes all embeddings for an entry.
// Called when entry is deleted.
func DeleteEmbeddings(ctx context.Context, db *sql.DB, entryID string) error

// RefreshEmbedding re-embeds an entry if content changed.
// Compares text_hash to detect changes.
// Returns true if embedding was updated.
func RefreshEmbedding(ctx context.Context, db *sql.DB, cfg *EmbedConfig, entryID string) (bool, error)
```

### 11.2 Async Embedding on Publish

Embedding should not block the user action. Use async processing:

```go
// EmbedQueue is a background worker that processes embedding requests.
type EmbedQueue struct {
    cfg    *EmbedConfig
    db     *sql.DB
    queue  chan embedJob
    wg     sync.WaitGroup
}

type embedJob struct {
    EntryID   string
    EntryType string  // "request" | "answer"
    Priority  int     // 0 = normal, 1 = high (new request needs matching)
}

// Start begins processing the embedding queue.
// Workers defaults to 2 (Fireworks rate limit friendly).
func (q *EmbedQueue) Start(workers int)

// Stop gracefully shuts down the queue.
func (q *EmbedQueue) Stop()

// Enqueue adds an entry for embedding.
// Non-blocking; returns immediately.
func (q *EmbedQueue) Enqueue(entryID, entryType string, priority int)
```

### 11.3 Integration Points

**On Request Create:**
```go
func HandleCreateRequest(w http.ResponseWriter, r *http.Request) {
    // ... validation, RBAC, insert entry ...
    
    // Queue embedding (non-blocking)
    embedQueue.Enqueue(request.ID, "request", 1)  // high priority
    
    // Return success immediately
    respondJSON(w, request)
}
```

**On Answer Publish:**
```go
func HandlePublishAnswer(w http.ResponseWriter, r *http.Request) {
    err := db.Transaction(func(tx *sql.Tx) error {
        // ... update status, stage ...
        
        // Embedding happens inline for matching (within transaction timeout)
        err := EmbedAnswer(ctx, tx, cfg, answer.ID)
        if err != nil {
            // Log but don't fail — matching can happen later
            log.Warn("embedding failed, will retry", "error", err)
        }
        
        // Match and create answer_links
        matches, _ := MatchAnswerToRequests(ctx, tx, actorID, answer.ID)
        for _, m := range matches {
            insertAnswerLink(tx, answer.ID, m.EntryID, "pending", m.Score)
        }
        
        // Broadcast confirmed links
        return BroadcastAnswer(ctx, tx, answer.ID)
    })
    
    respondJSON(w, answer)
}
```

### 11.4 Error Handling

```go
var (
    ErrTextTooLong       = errors.New("text exceeds model context limit")
    ErrEmptyText         = errors.New("text is empty or whitespace only")
    ErrEmbeddingFailed   = errors.New("embedding API call failed")
    ErrRateLimited       = errors.New("embedding API rate limited")
    ErrNoEmbedding       = errors.New("entry has no embedding")
    ErrWorkstreamConfig  = errors.New("workstream missing match configuration")
)
```

Retry policy for transient errors:
- `ErrRateLimited`: exponential backoff, max 3 retries
- `ErrEmbeddingFailed`: retry once after 1s
- All others: fail immediately

---

## 12. answer_links Table (Updated)

Incorporates SPEC-REVIEW feedback on rejection tracking:

```sql
CREATE TABLE answer_links (
    answer_id    TEXT NOT NULL REFERENCES entries(entry_id),
    request_id   TEXT NOT NULL REFERENCES entries(entry_id),
    
    -- Who created the link
    linked_by    TEXT NOT NULL,
    linked_at    INTEGER NOT NULL,
    
    -- AI matching metadata
    ai_score     REAL,           -- cosine similarity at time of match
    ai_model     TEXT,           -- model used for matching
    
    -- Review status
    status       TEXT NOT NULL DEFAULT 'pending',
        -- 'pending': AI suggested, awaiting human review
        -- 'confirmed': IB confirmed the match
        -- 'rejected': IB rejected the match
        -- 'self_confirmed': requester accepted dedup suggestion
        -- 'rejected_by_requester': requester declined dedup suggestion
    
    reviewed_by  TEXT,           -- who reviewed (if status != pending)
    reviewed_at  INTEGER,        -- when reviewed
    reject_reason TEXT,          -- why rejected (if status = rejected)
    
    PRIMARY KEY (answer_id, request_id)
);

CREATE INDEX idx_links_answer ON answer_links(answer_id);
CREATE INDEX idx_links_request ON answer_links(request_id);
CREATE INDEX idx_links_status ON answer_links(status);
```

---

## 13. Summary: What Gets Built

| Component | Location | Purpose |
|-----------|----------|---------|
| `lib/embed.go` | Core embedding logic | API calls, similarity, storage |
| `embeddings` table | Schema | Vector storage |
| `broadcasts` table | Schema | Idempotency |
| `answer_links` | Schema update | Status + rejection tracking |
| `embed_audit` table | Schema | Compliance logging |
| `EmbedQueue` | Background worker | Async processing |
| Workstream config | Entry.Data | Per-workstream thresholds |

**Not built (future):**
- Qdrant integration (interface defined, impl deferred)
- File content embedding (privacy: out of scope)
- Auto-confirm (threshold defined, feature disabled for MVP)

---

*This document extends SPEC.md. If conflicts exist, discuss before implementing.*