inou/specs/prima-integration.md

# Prima Integration — Technical Specification

**inou Health × Prima Brain MRI VLM**
**Version:** 1.0 · **Date:** 2026-02-14 · **Author:** James (AI Architect) · **Reviewer:** Johan Jongsma

---

## Executive Summary

This spec defines how inou integrates [Prima](https://github.com/MLNeurosurg/Prima), the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations.

The design prioritizes **cost efficiency** (serverless GPU, intelligent series selection), **security** (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and **seamless UX** (upload → automatic analysis → results in viewer).

**Key numbers:**
- Per-study cost with intelligent selection: **$0.04–$0.13** (vs $0.22–$0.66 naive)
- Cold start to results: **60–120 seconds**
- Zero always-on GPU cost

---

## Table of Contents

1. [System Architecture](#1-system-architecture)
2. [Intelligent Series Selection](#2-intelligent-series-selection)
3. [RunPod Serverless Worker](#3-runpod-serverless-worker)
4. [API Design](#4-api-design)
5. [Docker Image](#5-docker-image)
6. [Data Flow & Security](#6-data-flow--security)
7. [Cost Analysis](#7-cost-analysis)
8. [User Experience](#8-user-experience)
9. [Error Handling](#9-error-handling)
10. [Implementation Plan](#10-implementation-plan)
11. [Future Work](#11-future-work)

---

## 1. System Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                         inou Frontend                               │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────────────────────┐ │
│  │  DICOM   │  │   DICOM      │  │   AI Results Panel            │ │
│  │  Upload  │──│   Viewer     │  │  • Diagnosis probabilities    │ │
│  │          │  │              │  │  • Urgency / priority         │ │
│  └────┬─────┘  └──────────────┘  │  • Specialist referral        │ │
│       │                          │  • Series-level annotations   │ │
│       │                          └──────────┬────────────────────┘ │
└───────┼─────────────────────────────────────┼─────────────────────┘
        │ POST /api/studies                   │ GET /api/studies/:id/analysis
        ▼                                     ▲
┌───────────────────────────────────────────────────────────────────┐
│                      inou Backend (Go)                            │
│                                                                   │
│  ┌──────────┐  ┌─────────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  DICOM   │  │   Series    │  │  Job     │  │   Dossier    │  │
│  │  Ingest  │─▶│  Selector   │─▶│  Queue   │  │   Store      │  │
│  │  + Parse │  │  (LLM/Rules)│  │          │  │  (SQLite+AES)│  │
│  └──────────┘  └─────────────┘  └────┬─────┘  └──────▲───────┘  │
│                                      │                │          │
└──────────────────────────────────────┼────────────────┼──────────┘
                                       │                │
                              Job dispatch        Results callback
                              (HTTPS POST)        (HTTPS POST + HMAC)
                                       │                │
                                       ▼                │
                    ┌──────────────────────────────────────────┐
                    │          RunPod Serverless                │
                    │                                          │
                    │  ┌────────────────────────────────────┐  │
                    │  │       Prima Worker (L40S 48GB)     │  │
                    │  │                                    │  │
                    │  │  1. Download DICOM from signed URL │  │
                    │  │  2. Load VQ-VAE tokenizer          │  │
                    │  │  3. Tokenize selected series       │  │
                    │  │  4. Free VQ-VAE from VRAM          │  │
                    │  │  5. Load Prima VLM                 │  │
                    │  │  6. Run inference (fp16)           │  │
                    │  │  7. POST results back to inou      │  │
                    │  │  8. Purge all DICOM data           │  │
                    │  │                                    │  │
                    │  └────────────────────────────────────┘  │
                    │                                          │
                    │  GPU: L40S 48GB · Scales 0→N · Pay/sec  │
                    └──────────────────────────────────────────┘
```

### Component Responsibilities

| Component | Technology | Responsibility |
|-----------|-----------|----------------|
| **DICOM Ingest** | Go | Parse uploaded DICOM, extract metadata, store encrypted |
| **Series Selector** | Go + LLM (Claude Haiku) | Analyze DICOM metadata, select diagnostically relevant series |
| **Job Queue** | Go (in-process) | Manage analysis jobs, retry logic, status tracking |
| **RunPod Worker** | Python 3.11, PyTorch 2.6 | Run Prima inference on selected series |
| **Dossier Store** | SQLite + AES-256-GCM | Store results encrypted at rest |
| **AI Results Panel** | Frontend (existing viewer) | Display diagnosis, urgency, referral |

---

## 2. Intelligent Series Selection

### The Problem

A typical brain MRI study contains **8–15 series** with **100–500 slices each**, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 2–4 specific sequence types.

### Selection Architecture

```
DICOM Study Metadata
        │
        ▼
┌───────────────────────────────────────────┐
│         Series Selection Pipeline          │
│                                            │
│  Step 1: Extract metadata per series       │
│    • SeriesDescription (0008,103E)         │
│    • SequenceName (0018,0024)              │
│    • MRAcquisitionType (0018,0023)         │
│    • ScanningSequence (0018,0020)          │
│    • ContrastBolusAgent (0018,0010)        │
│    • SliceThickness (0018,0050)            │
│    • NumberOfSlices                         │
│    • ImageType (0008,0008)                 │
│    • Modality (0008,0060)                  │
│                                            │
│  Step 2: Rule-based classification         │
│    Map each series → sequence type:        │
│    T1, T1+C, T2, T2-FLAIR, DWI, ADC,     │
│    SWI, MRA, SCOUT, LOCALIZER, DERIVED    │
│                                            │
│  Step 3: Clinical protocol matching        │
│    Apply selection rules (see table)       │
│                                            │
│  Step 4: LLM fallback for ambiguous cases  │
│    If rule-based classification fails,     │
│    send metadata to Claude Haiku           │
│                                            │
│  Output: ordered list of series to analyze │
└───────────────────────────────────────────┘
```

### Rule-Based Classification

Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach:

```go
type SeriesClassification struct {
    SeriesUID     string
    SequenceType  string   // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER
    HasContrast   bool
    IsLocalizer   bool
    IsDerived     bool
    SliceCount    int
    Confidence    float64  // 0.0-1.0, below 0.7 triggers LLM fallback
}
```

**Classification rules (ordered by priority):**

| Signal | Tag | Example Values | Maps To |
|--------|-----|---------------|---------|
| ImageType contains "LOCALIZER" | (0008,0008) | `ORIGINAL\PRIMARY\LOCALIZER` | SKIP |
| ImageType contains "DERIVED" | (0008,0008) | `DERIVED\SECONDARY` | SKIP (usually) |
| SliceCount < 10 | computed | — | SKIP (scout/cal) |
| ContrastBolusAgent present | (0018,0010) | `Gadavist` | +contrast flag |
| Description contains "flair" | (0008,103E) | `AX T2 FLAIR`, `FLAIR_DARK-FLUID` | T2-FLAIR |
| Description contains "t2" (no flair) | (0008,103E) | `AX T2`, `T2_TSE_TRA` | T2 |
| Description contains "t1" | (0008,103E) | `SAG T1`, `T1_MPRAGE` | T1 or T1+C |
| Description contains "dwi"/"diffusion" | (0008,103E) | `DWI_b1000`, `EP2D_DIFF` | DWI |
| Description contains "adc" | (0008,103E) | `ADC_MAP` | ADC |
| Description contains "swi"/"suscept" | (0008,103E) | `SWI_mIP`, `SUSCEPTIBILITY` | SWI |
| ScanningSequence = "EP" + b-value tag | (0018,0020) | — | DWI |
| No match, confidence < 0.7 | — | — | → LLM fallback |

### Clinical Protocol Selection

Given the classified series, select based on the clinical question (or use a general protocol if none specified):

| Clinical Question | Required Series | Optional Series | Expected Count |
|-------------------|-----------------|-----------------|----------------|
| **General screening** | T1, T2, T2-FLAIR | DWI, T1+C | 3–5 |
| **Hydrocephalus** | T2-FLAIR, T2 | T1, DWI | 2–3 |
| **Tumor / mass** | T1+C, T2-FLAIR, T1 (pre-contrast) | DWI, SWI | 3–4 |
| **Stroke / acute** | DWI, ADC, T2-FLAIR | MRA, SWI | 3–4 |
| **MS / demyelination** | T2-FLAIR, T1+C, T2 | DWI | 3 |
| **Infection** | T1+C, DWI, T2-FLAIR | T2 | 3 |

**Default behavior (no clinical question):** Use "General screening" — select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum.

### LLM Fallback

When rule-based classification confidence is below 0.7 for any series:

```
Prompt to Claude Haiku:

You are a neuroradiology MRI series classifier. Given the following DICOM
metadata for a single MRI series, classify it.

SeriesDescription: {desc}
SequenceName: {seq_name}
ScanningSequence: {scan_seq}
MRAcquisitionType: {acq_type}
ImageType: {img_type}
ContrastBolusAgent: {contrast}
SliceCount: {count}
Manufacturer: {mfr}
ManufacturerModelName: {model}

Respond with JSON:
{
  "sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER",
  "has_contrast": true|false,
  "is_diagnostically_relevant": true|false,
  "confidence": 0.0-1.0,
  "reasoning": "brief explanation"
}
```

**Cost of LLM fallback:** ~$0.001 per series (Haiku). Triggered for maybe 1–2 series per study. Negligible.

---

## 3. RunPod Serverless Worker

### Worker Architecture

The RunPod serverless worker wraps Prima's `pipeline.py` with an HTTP handler:

```python
# handler.py — RunPod serverless handler
import runpod
import torch
import tempfile
import requests
import shutil
import json
import time
import gc
from pathlib import Path

from pipeline import Pipeline  # Prima's pipeline

def handler(job):
    """
    Input schema:
    {
        "input": {
            "dicom_url": "https://inou.../signed-download/...",
            "series_uids": ["1.2.840...", "1.2.840..."],
            "callback_url": "https://inou.../api/analysis/callback",
            "callback_hmac_key": "hex-encoded-key",
            "job_id": "uuid",
            "study_id": "uuid"
        }
    }
    """
    input_data = job["input"]
    work_dir = Path(tempfile.mkdtemp())

    try:
        # 1. Download DICOM (only selected series)
        t0 = time.time()
        dicom_path = download_dicom(input_data["dicom_url"], work_dir)

        # 2. Filter to selected series only
        filter_series(dicom_path, input_data["series_uids"])

        # 3. Run Prima pipeline
        config = build_config(dicom_path, work_dir / "output")
        pipeline = Pipeline(config)

        # Load study
        pipeline.load_mri_study()

        # Tokenize (VQ-VAE) — loads tokenizer, runs, then frees
        pipeline.load_tokenizer_model()
        tokens = pipeline.tokenize_series()
        pipeline.free_tokenizer()  # Free VRAM

        # Run Prima VLM
        pipeline.load_prima_model()
        results = pipeline.run_inference(tokens)
        pipeline.free_prima_model()

        elapsed = time.time() - t0

        # 4. Build response
        response = {
            "job_id": input_data["job_id"],
            "study_id": input_data["study_id"],
            "elapsed_seconds": elapsed,
            "series_analyzed": len(input_data["series_uids"]),
            "results": {
                "diagnoses": results["diagnoses"],       # list of {name, probability}
                "priority": results["priority"],          # STAT / URGENT / ROUTINE
                "referral": results["referral"],           # specialist recommendation
                "differential": results["differential"],   # ranked differential diagnosis
            }
        }

        # 5. POST results back to inou
        post_callback(input_data["callback_url"], response,
                      input_data["callback_hmac_key"])

        return response

    finally:
        # 6. ALWAYS purge DICOM data — no PHI left on worker
        shutil.rmtree(work_dir, ignore_errors=True)
        torch.cuda.empty_cache()
        gc.collect()

runpod.serverless.start({"handler": handler})
```

### Worker Lifecycle

```
                    RunPod Serverless
                    ─────────────────
Idle (no GPU)  ──────────────────────── $0.00/sec
                         │
                    Job arrives
                         │
                         ▼
Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S)
  • Container starts                     starts billing
  • Model weights loaded from
    network volume (NFS, ~10s)
  • GPU warm
                         │
                         ▼
Inference (~30-60s) ─────────────────── $0.00073/sec
  • VQ-VAE tokenization (~10-20s)
  • Free VQ-VAE
  • Prima VLM inference (~20-40s)
  • Results callback
                         │
                         ▼
Idle timeout (5s) ───────────────────── $0.00073/sec (configurable)
                         │
                         ▼
Scale to 0 ──────────────────────────── $0.00/sec
```

### RunPod Configuration

```json
{
    "name": "inou-prima-worker",
    "gpu": "NVIDIA L40S",
    "gpuCount": 1,
    "volumeId": "vol_prima_weights",
    "volumeMountPath": "/models",
    "dockerImage": "inou/prima-worker:latest",
    "env": {
        "MODEL_DIR": "/models",
        "PRIMA_CKPT": "/models/primafullmodel107.pt",
        "VQVAE_CKPT": "/models/vqvae_model_step16799.pth"
    },
    "scalerType": "QUEUE_DELAY",
    "scalerValue": 4,
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeout": 5,
    "executionTimeout": 300,
    "flashboot": true
}
```

**Network Volume:** Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS — no download delay after first pull.

---

## 4. API Design

### inou Backend Endpoints

#### 4.1 Trigger Analysis

```
POST /api/studies/{studyID}/analyze
Authorization: Bearer <token>

Request Body (optional):
{
    "clinical_question": "evaluate for hydrocephalus",  // optional
    "priority": "routine",                               // routine | urgent | stat
    "force_series": ["1.2.840..."]                       // optional: override selection
}

Response 202:
{
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "selected_series": [
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2 FLAIR",
            "sequence_type": "T2-FLAIR",
            "slice_count": 28,
            "selection_reason": "Primary sequence for hydrocephalus evaluation"
        },
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2",
            "sequence_type": "T2",
            "slice_count": 24,
            "selection_reason": "Complementary T2-weighted for ventricular assessment"
        }
    ],
    "estimated_cost_usd": 0.066,
    "estimated_duration_seconds": 90
}
```

#### 4.2 Analysis Status

```
GET /api/studies/{studyID}/analysis
Authorization: Bearer <token>

Response 200:
{
    "job_id": "550e8400-...",
    "status": "completed",         // queued | processing | completed | failed
    "created_at": "2026-02-14T20:15:00Z",
    "completed_at": "2026-02-14T20:16:32Z",
    "duration_seconds": 92,
    "cost_usd": 0.067,
    "series_analyzed": 2,
    "results": { ... }             // see 4.3
}
```

#### 4.3 Analysis Results

```
GET /api/studies/{studyID}/analysis/results
Authorization: Bearer <token>

Response 200:
{
    "study_id": "...",
    "model": "prima-v1.07",
    "model_version": "primafullmodel107",
    "analysis_timestamp": "2026-02-14T20:16:32Z",

    "diagnoses": [
        {
            "condition": "Normal pressure hydrocephalus",
            "icd10": "G91.2",
            "probability": 0.87,
            "category": "developmental"
        },
        {
            "condition": "Cerebral atrophy",
            "icd10": "G31.9",
            "probability": 0.34,
            "category": "degenerative"
        },
        {
            "condition": "Periventricular white matter changes",
            "icd10": "I67.3",
            "probability": 0.28,
            "category": "vascular"
        }
    ],

    "priority": {
        "level": "URGENT",
        "reasoning": "High probability hydrocephalus requiring neurosurgical evaluation"
    },

    "referral": {
        "specialty": "Neurosurgery",
        "urgency": "within_1_week",
        "reasoning": "NPH with high probability — consider VP shunt evaluation"
    },

    "differential": [
        "Normal pressure hydrocephalus",
        "Communicating hydrocephalus",
        "Cerebral atrophy (ex vacuo ventriculomegaly)"
    ],

    "series_results": [
        {
            "series_uid": "1.2.840...",
            "description": "AX T2 FLAIR",
            "findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow."
        }
    ],

    "disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation."
}
```

#### 4.4 Callback Endpoint (Worker → inou)

```
POST /api/analysis/callback
X-HMAC-Signature: sha256=<hex>
Content-Type: application/json

{
    "job_id": "...",
    "study_id": "...",
    "status": "completed",
    "elapsed_seconds": 47.3,
    "series_analyzed": 2,
    "results": { ... }
}
```

HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks.

#### 4.5 DICOM Download (Worker → inou)

```
GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2}
→ 200 application/zip (streaming)

Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs.
Only selected series are included in the download — minimizes data transfer.
```

---

## 5. Docker Image

### Dockerfile

```dockerfile
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04

# System deps
RUN apt-get update && apt-get install -y \
    python3.11 python3.11-dev python3-pip \
    libgl1-mesa-glx libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3.11 /usr/bin/python

# Python deps (cached layer)
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt

# flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89)
RUN pip install flash-attn==2.7.4.post1 --no-build-isolation

# RunPod SDK
RUN pip install runpod==1.7.0

# Prima source code
COPY Prima/ /app/Prima/
COPY handler.py /app/handler.py

WORKDIR /app

# Model weights are on network volume, not baked into image
# /models/primafullmodel107.pt (~3.5GB)
# /models/vqvae_model_step16799.pth (~500MB)

CMD ["python", "handler.py"]
```

**Image size:** ~12GB (CUDA base + PyTorch + flash-attn + Prima)

### What's in the image vs network volume:

| Component | Location | Size | Rationale |
|-----------|----------|------|-----------|
| CUDA 12.4 + Ubuntu | Docker image | ~4GB | Cached, rarely changes |
| PyTorch 2.6 + deps | Docker image | ~6GB | Cached, rarely changes |
| flash-attn | Docker image | ~500MB | Must match CUDA version |
| Prima source | Docker image | ~50MB | Changes with updates |
| RunPod handler | Docker image | ~5KB | Our wrapper code |
| `primafullmodel107.pt` | Network volume | ~3.5GB | Too large for image, shared |
| `vqvae_model_step16799.pth` | Network volume | ~500MB | Shared across workers |

### Build & Deploy

```bash
# Build
docker build -t inou/prima-worker:latest .

# Push to RunPod's registry (or Docker Hub)
docker push inou/prima-worker:latest

# Network volume setup (one-time)
runpod volume create --name prima-weights --size 10
# Upload model weights to volume via RunPod console or SSH
```

---

## 6. Data Flow & Security

### Data Flow Diagram

```
   User uploads       inou stores          Worker downloads     Worker processes
   DICOM study        encrypted            selected series      and returns results
                      at rest              (signed URL)
        │                 │                      │                     │
        ▼                 ▼                      ▼                     ▼
   ┌─────────┐     ┌───────────┐          ┌──────────┐         ┌──────────┐
   │ Browser │────▶│ inou      │─────────▶│ RunPod   │────────▶│ inou     │
   │ Upload  │     │ Backend   │  signed  │ Worker   │ HMAC    │ Backend  │
   │ (TLS)   │     │ (AES-GCM) │  URL     │ (tmpfs)  │ callback│ (store)  │
   └─────────┘     └───────────┘          └──────────┘         └──────────┘
                                                │
                                          On completion:
                                          shutil.rmtree()
                                          No PHI retained
```

### Security Controls

| Concern | Control |
|---------|---------|
| **DICOM at rest (inou)** | AES-256-GCM encryption in SQLite (existing) |
| **DICOM in transit to worker** | TLS 1.3 + time-limited signed URL (5 min TTL) |
| **DICOM on worker** | Stored in tmpfs; `shutil.rmtree()` in `finally` block; container destroyed after job |
| **Results in transit** | TLS 1.3 + HMAC-SHA256 callback verification |
| **Results at rest** | Encrypted in dossier (existing AES-256-GCM) |
| **PHI in logs** | No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata |
| **RunPod access** | API key stored as inou backend env var, never exposed to frontend |
| **Worker isolation** | Each job runs in isolated container; no shared filesystem between jobs |

### HIPAA Considerations

| HIPAA Requirement | Implementation |
|-------------------|----------------|
| **Access controls** | inou auth (existing); RunPod API key; signed URLs |
| **Encryption** | AES-256-GCM at rest; TLS 1.3 in transit |
| **Audit trail** | All analysis requests logged with timestamp, user, study ID |
| **Minimum necessary** | Only selected series transmitted; no patient demographics sent to worker |
| **BAA** | RunPod offers BAA for serverless — **must execute before production** |
| **Data retention** | Zero retention on worker; configurable retention on inou |
| **De-identification** | DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) — Series UID sufficient for processing |

### PHI Minimization Pipeline

Before sending DICOM to RunPod, inou strips the following tags:

```go
var phiTagsToStrip = []dicom.Tag{
    dicom.PatientName,          // (0010,0010)
    dicom.PatientID,            // (0010,0020)
    dicom.PatientBirthDate,     // (0010,0030)
    dicom.PatientSex,           // (0010,0040) — keep if Prima needs it
    dicom.PatientAddress,       // (0010,1040)
    dicom.ReferringPhysician,   // (0008,0090)
    dicom.InstitutionName,      // (0008,0080)
    dicom.InstitutionAddress,   // (0008,0081)
    dicom.AccessionNumber,      // (0008,0050)
}
// Pixel data and series-level imaging tags are preserved — Prima needs those
```

---

## 7. Cost Analysis

### RunPod Pricing (L40S Serverless)

| Metric | Value |
|--------|-------|
| Per-second billing | $0.00073/sec |
| Per-minute | $0.0438/min |
| Per-hour | $2.628/hr |
| Minimum charge per job | None (per-second) |
| Network volume (10GB) | ~$1.00/month |
| Idle workers | $0.00 (scale to 0) |

### Per-Series Timing Breakdown

| Phase | Duration | Notes |
|-------|----------|-------|
| Cold start (first job) | 15–25s | Container + model load from volume |
| Warm start (subsequent) | 0–2s | Worker already running |
| DICOM download | 2–5s | Depends on series size (~50-200MB) |
| VQ-VAE tokenization | 10–20s | Per series, depends on slice count |
| VQ-VAE → Prima model swap | 2–3s | Free VQ-VAE, load Prima |
| Prima inference | 15–30s | Depends on token count |
| Results callback | <1s | Small JSON payload |
| **Total per series** | **30–60s** | **Warm: 30–40s typical** |

### Cost Per Study: With vs Without Intelligent Selection

#### Scenario A: Routine Brain MRI (12 series, ~3000 slices)

| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|----------|-----------------|---------------|----------|-------------------|-------|
| **Naive (all series)** | 12 | 360–720s | $0.26–$0.53 | $0.00 | **$0.26–$0.53** |
| **Intelligent (selected)** | 3 | 90–180s | $0.066–$0.13 | ~$0.003 | **$0.07–$0.13** |
| **Savings** | — | — | — | — | **73–75%** |

#### Scenario B: Brain MRI with Contrast (15 series, ~5000 slices)

| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|----------|-----------------|---------------|----------|-------------------|-------|
| **Naive** | 15 | 450–900s | $0.33–$0.66 | $0.00 | **$0.33–$0.66** |
| **Intelligent** | 4 | 120–240s | $0.088–$0.18 | ~$0.004 | **$0.09–$0.18** |
| **Savings** | — | — | — | — | **73%** |

#### Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol)

| Approach | Series Processed | Est. Duration | GPU Cost | Total |
|----------|-----------------|---------------|----------|-------|
| **Naive** | 6 | 180–360s | $0.13–$0.26 | **$0.13–$0.26** |
| **Intelligent** | 3 | 90–180s | $0.066–$0.13 | **$0.07–$0.13** |

### Monthly Volume Projections

| Volume | Naive Cost | Intelligent Cost | Monthly Savings |
|--------|-----------|-----------------|-----------------|
| 10 studies/month | $3.30–$6.60 | $0.90–$1.80 | $2.40–$4.80 |
| 50 studies/month | $16.50–$33.00 | $4.50–$9.00 | $12.00–$24.00 |
| 200 studies/month | $66.00–$132.00 | $18.00–$36.00 | $48.00–$96.00 |
| 1000 studies/month | $330–$660 | $90–$180 | $240–$480 |

**Fixed costs:** RunPod network volume ~$1/month. No other infrastructure costs — workers scale to zero.

### Break-Even vs Always-On

An always-on L40S instance costs ~$0.69/hr = **$500/month**.

Break-even point: $500 ÷ $0.13/study = **~3,850 studies/month** before always-on becomes cheaper.

For inou's expected volumes (10–200/month), **serverless is 50–500× cheaper**.

---

## 8. User Experience

### Upload → Results Flow

```
┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD                                                    │
│                                                              │
│  User uploads DICOM study (drag & drop or folder select)    │
│  inou parses metadata, displays series list in viewer       │
│  "AI Analysis Available" badge appears on brain MRI studies │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 2. ANALYSIS INITIATED                                        │
│                                                              │
│  [Analyze with AI] button in viewer toolbar                 │
│  Optional: clinical question text field                     │
│  "Analyzing 3 of 12 series... Est. ~90 seconds"            │
│  Progress indicator with phases:                            │
│    ◉ Series selected (3 of 12)                              │
│    ◉ Uploading to analysis engine...                        │
│    ○ AI processing...                                        │
│    ○ Results ready                                           │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 3. RESULTS DISPLAYED                                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ AI Analysis Results              Prima v1.07        │    │
│  │                                                     │    │
│  │ ⚠️  URGENT — Neurosurgery referral recommended      │    │
│  │                                                     │    │
│  │ Diagnoses:                                          │    │
│  │   ████████████████████░░  87%  NPH                  │    │
│  │   ██████░░░░░░░░░░░░░░░  34%  Cerebral atrophy     │    │
│  │   █████░░░░░░░░░░░░░░░░  28%  WM changes           │    │
│  │                                                     │    │
│  │ Referral: Neurosurgery (within 1 week)              │    │
│  │ "NPH with high probability — VP shunt evaluation"  │    │
│  │                                                     │    │
│  │ Series analyzed: T2 FLAIR, T2 (2 of 12)            │    │
│  │ Analysis time: 47s · Cost: $0.03                    │    │
│  │                                                     │    │
│  │ ⚕️ AI-assisted analysis — not a radiologist report  │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  Results panel integrated into existing DICOM viewer        │
│  Clicking a diagnosis highlights the relevant series        │
│  Results persisted in patient dossier                        │
└─────────────────────────────────────────────────────────────┘
```

### Automatic vs Manual Analysis

**Option A (default): Manual trigger** — User clicks "Analyze with AI" button. Best for initial launch, gives user control.

**Option B (future): Auto-trigger** — Analysis starts automatically on upload for brain MRI studies. Configurable in settings.

### Viewer Integration

The AI results panel is a new sidebar component in the existing DICOM viewer:

```
┌──────────────────────────────────────────────────────────────┐
│  inou Viewer                                          [≡]    │
├────────────────────────────────────┬─────────────────────────┤
│                                    │  📋 Study Info          │
│                                    │  Patient: [redacted]    │
│                                    │  Date: 2026-02-14       │
│        DICOM Image Display         │  Series: 12             │
│                                    │                         │
│      (existing viewer canvas)      ├─────────────────────────┤
│                                    │  🤖 AI Analysis         │
│                                    │                         │
│                                    │  Status: ✅ Complete     │
│                                    │  [Results panel above]  │
│                                    │                         │
│                                    │  [Re-analyze ▼]         │
│                                    │  [Export Report]        │
├────────────────────────────────────┴─────────────────────────┤
│  Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450           │
└──────────────────────────────────────────────────────────────┘
```

---

## 9. Error Handling

### Error Categories & Recovery

| Error | Detection | Recovery | User Impact |
|-------|-----------|----------|-------------|
| **RunPod cold start timeout** | No response in 60s | Retry once; if fails, queue for retry in 5 min | "Analysis queued — GPU starting up" |
| **Worker execution timeout** | RunPod 300s timeout | Job marked failed; auto-retry with reduced series count | "Analysis timed out — retrying with fewer series" |
| **Model inference error** | Exception in handler | Return error in callback; log stack trace | "Analysis failed — try again or contact support" |
| **DICOM download failure** | HTTP error / timeout | Retry download 3× with exponential backoff | Transparent to user |
| **Callback delivery failure** | HTTP error | Worker retries 3×; inou polls RunPod status API as fallback | Transparent — results arrive via polling |
| **Invalid DICOM study** | Parse errors in ingest | Reject at upload time with specific error | "Unable to parse series — unsupported format" |
| **No relevant series found** | Series selector returns empty | Skip analysis; inform user | "No brain MRI sequences detected in this study" |
| **RunPod quota / billing** | 402/429 errors | Alert admin; queue jobs for later | "Analysis temporarily unavailable" |
| **Network volume unavailable** | Model load failure | Worker retries; if persistent, alert | "Analysis service maintenance" |

### Retry Strategy

```go
type RetryPolicy struct {
    MaxAttempts     int           // 3
    InitialBackoff  time.Duration // 10s
    MaxBackoff      time.Duration // 5m
    BackoffFactor   float64       // 2.0
    RetryableErrors []string      // timeout, 5xx, connection_refused
}
```

### Monitoring & Alerting

| Metric | Alert Threshold | Channel |
|--------|----------------|---------|
| Job failure rate | >10% in 1 hour | Slack / Signal |
| Median latency | >180s | Dashboard |
| RunPod spend | >$50/day | Email |
| Cold start frequency | >50% of jobs | Dashboard (indicates insufficient idle timeout) |
| Callback failures | Any | Immediate alert |

---

## 10. Implementation Plan

### Phase 1: Foundation (2 weeks)

- [ ] Fork Prima repo, create `inou-prima-worker` repo
- [ ] Write RunPod handler (`handler.py`)
- [ ] Build Docker image, test locally with `nvidia-docker`
- [ ] Set up RunPod account, network volume, deploy worker
- [ ] Test end-to-end with a sample DICOM study
- [ ] Verify model outputs match Prima's reference results

### Phase 2: Series Selection (1 week)

- [ ] Implement rule-based series classifier in Go
- [ ] Build clinical protocol selection logic
- [ ] Add LLM fallback (Claude Haiku integration)
- [ ] Test with diverse DICOM studies (Siemens, GE, Philips naming conventions)
- [ ] Validate selection accuracy: should match radiologist series choice >90%

### Phase 3: Backend Integration (2 weeks)

- [ ] Add analysis endpoints to inou backend
- [ ] Implement job queue with retry logic
- [ ] Build DICOM download endpoint with signed URLs
- [ ] Implement callback handler with HMAC verification
- [ ] Add PHI stripping to DICOM export pipeline
- [ ] Store results in dossier (encrypted)
- [ ] Write integration tests

### Phase 4: Frontend Integration (1 week)

- [ ] Add "Analyze with AI" button to viewer
- [ ] Build AI results panel component
- [ ] Implement progress indicator (WebSocket or polling)
- [ ] Add results to dossier view
- [ ] Export report functionality

### Phase 5: Hardening (1 week)

- [ ] Load testing (concurrent jobs, cold starts)
- [ ] Error handling for all failure modes
- [ ] Monitoring dashboard
- [ ] Cost tracking and alerting
- [ ] Security review (PHI flow, access controls)
- [ ] Documentation

**Total: ~7 weeks** to production-ready MVP.

---

## 11. Future Work

### Near-Term (3–6 months)

- **Auto-trigger analysis** on brain MRI upload (configurable)
- **Batch processing** — analyze multiple studies in parallel
- **Result caching** — skip re-analysis if study hasn't changed
- **Comparison mode** — compare current vs prior study results

### Medium-Term (6–12 months)

- **Fine-tuning on specific conditions** — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case)
- **Radiologist feedback loop** — Allow radiologists to confirm/reject AI findings, feeding back into training data
- **Multi-series reasoning** — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1)

### Long-Term (12+ months)

- **Multi-organ expansion** — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture
- **On-premise deployment** — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU)
- **Real-time inference** — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration)
- **Clinical decision support** — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations
- **FDA 510(k) pathway** — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device

### Architecture Extensibility

The serverless worker pattern is model-agnostic. To add a new model:

1. Build a new Docker image with the model
2. Deploy as a separate RunPod serverless endpoint
3. Add a new series selector profile
4. Route from inou backend based on study type / body part

```
                    inou Backend
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
         Prima       SpineVLM    CardiacAI
        (Brain)      (Spine)     (Heart)
         L40S         L40S        A100
```

Same queue, same callback pattern, same dossier storage. Only the worker and selector change.

---

## Appendix A: Prima Model Details

| Property | Value |
|----------|-------|
| **Paper** | "Learning neuroimaging models from health system-scale data" (arXiv:2509.18638) |
| **Training data** | UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images |
| **Architecture** | Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer |
| **Diagnoses covered** | 52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental |
| **Mean AUC** | 90.1 ± 5.0% (prospective 30K study validation) |
| **License** | MIT |
| **GPU requirement** | Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100) |
| **Weights** | `primafullmodel107.pt` (~3.5GB) + `vqvae_model_step16799.pth` (~500MB) |
| **Dependencies** | PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49 |

## Appendix B: DICOM Tags Reference

```
(0008,0008) ImageType          — ORIGINAL\PRIMARY vs DERIVED\SECONDARY
(0008,0060) Modality           — MR
(0008,103E) SeriesDescription  — Free text, scanner-dependent
(0010,0010) PatientName        — PHI — STRIP before sending to worker
(0010,0020) PatientID          — PHI — STRIP
(0010,0030) PatientBirthDate   — PHI — STRIP
(0018,0010) ContrastBolusAgent — Present = contrast-enhanced
(0018,0020) ScanningSequence   — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar)
(0018,0023) MRAcquisitionType  — 2D, 3D
(0018,0024) SequenceName       — Scanner-specific pulse sequence name
(0018,0050) SliceThickness     — mm
(0020,0011) SeriesNumber       — Integer ordering
(0020,0013) InstanceNumber     — Slice index within series
(0028,0010) Rows               — Pixel rows
(0028,0011) Columns            — Pixel columns
```

---

*This specification is a living document. Update as implementation progresses and requirements evolve.*