inou/specs/prima-integration.md

998 lines
43 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Prima Integration — Technical Specification
**inou Health × Prima Brain MRI VLM**
**Version:** 1.0 · **Date:** 2026-02-14 · **Author:** James (AI Architect) · **Reviewer:** Johan Jongsma
---
## Executive Summary
This spec defines how inou integrates [Prima](https://github.com/MLNeurosurg/Prima), the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations.
The design prioritizes **cost efficiency** (serverless GPU, intelligent series selection), **security** (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and **seamless UX** (upload → automatic analysis → results in viewer).
**Key numbers:**
- Per-study cost with intelligent selection: **$0.04$0.13** (vs $0.22$0.66 naive)
- Cold start to results: **60120 seconds**
- Zero always-on GPU cost
---
## Table of Contents
1. [System Architecture](#1-system-architecture)
2. [Intelligent Series Selection](#2-intelligent-series-selection)
3. [RunPod Serverless Worker](#3-runpod-serverless-worker)
4. [API Design](#4-api-design)
5. [Docker Image](#5-docker-image)
6. [Data Flow & Security](#6-data-flow--security)
7. [Cost Analysis](#7-cost-analysis)
8. [User Experience](#8-user-experience)
9. [Error Handling](#9-error-handling)
10. [Implementation Plan](#10-implementation-plan)
11. [Future Work](#11-future-work)
---
## 1. System Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ inou Frontend │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────────────────┐ │
│ │ DICOM │ │ DICOM │ │ AI Results Panel │ │
│ │ Upload │──│ Viewer │ │ • Diagnosis probabilities │ │
│ │ │ │ │ │ • Urgency / priority │ │
│ └────┬─────┘ └──────────────┘ │ • Specialist referral │ │
│ │ │ • Series-level annotations │ │
│ │ └──────────┬────────────────────┘ │
└───────┼─────────────────────────────────────┼─────────────────────┘
│ POST /api/studies │ GET /api/studies/:id/analysis
▼ ▲
┌───────────────────────────────────────────────────────────────────┐
│ inou Backend (Go) │
│ │
│ ┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ DICOM │ │ Series │ │ Job │ │ Dossier │ │
│ │ Ingest │─▶│ Selector │─▶│ Queue │ │ Store │ │
│ │ + Parse │ │ (LLM/Rules)│ │ │ │ (SQLite+AES)│ │
│ └──────────┘ └─────────────┘ └────┬─────┘ └──────▲───────┘ │
│ │ │ │
└──────────────────────────────────────┼────────────────┼──────────┘
│ │
Job dispatch Results callback
(HTTPS POST) (HTTPS POST + HMAC)
│ │
▼ │
┌──────────────────────────────────────────┐
│ RunPod Serverless │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Prima Worker (L40S 48GB) │ │
│ │ │ │
│ │ 1. Download DICOM from signed URL │ │
│ │ 2. Load VQ-VAE tokenizer │ │
│ │ 3. Tokenize selected series │ │
│ │ 4. Free VQ-VAE from VRAM │ │
│ │ 5. Load Prima VLM │ │
│ │ 6. Run inference (fp16) │ │
│ │ 7. POST results back to inou │ │
│ │ 8. Purge all DICOM data │ │
│ │ │ │
│ └────────────────────────────────────┘ │
│ │
│ GPU: L40S 48GB · Scales 0→N · Pay/sec │
└──────────────────────────────────────────┘
```
### Component Responsibilities
| Component | Technology | Responsibility |
|-----------|-----------|----------------|
| **DICOM Ingest** | Go | Parse uploaded DICOM, extract metadata, store encrypted |
| **Series Selector** | Go + LLM (Claude Haiku) | Analyze DICOM metadata, select diagnostically relevant series |
| **Job Queue** | Go (in-process) | Manage analysis jobs, retry logic, status tracking |
| **RunPod Worker** | Python 3.11, PyTorch 2.6 | Run Prima inference on selected series |
| **Dossier Store** | SQLite + AES-256-GCM | Store results encrypted at rest |
| **AI Results Panel** | Frontend (existing viewer) | Display diagnosis, urgency, referral |
---
## 2. Intelligent Series Selection
### The Problem
A typical brain MRI study contains **815 series** with **100500 slices each**, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 24 specific sequence types.
### Selection Architecture
```
DICOM Study Metadata
┌───────────────────────────────────────────┐
│ Series Selection Pipeline │
│ │
│ Step 1: Extract metadata per series │
│ • SeriesDescription (0008,103E) │
│ • SequenceName (0018,0024) │
│ • MRAcquisitionType (0018,0023) │
│ • ScanningSequence (0018,0020) │
│ • ContrastBolusAgent (0018,0010) │
│ • SliceThickness (0018,0050) │
│ • NumberOfSlices │
│ • ImageType (0008,0008) │
│ • Modality (0008,0060) │
│ │
│ Step 2: Rule-based classification │
│ Map each series → sequence type: │
│ T1, T1+C, T2, T2-FLAIR, DWI, ADC, │
│ SWI, MRA, SCOUT, LOCALIZER, DERIVED │
│ │
│ Step 3: Clinical protocol matching │
│ Apply selection rules (see table) │
│ │
│ Step 4: LLM fallback for ambiguous cases │
│ If rule-based classification fails, │
│ send metadata to Claude Haiku │
│ │
│ Output: ordered list of series to analyze │
└───────────────────────────────────────────┘
```
### Rule-Based Classification
Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach:
```go
type SeriesClassification struct {
SeriesUID string
SequenceType string // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER
HasContrast bool
IsLocalizer bool
IsDerived bool
SliceCount int
Confidence float64 // 0.0-1.0, below 0.7 triggers LLM fallback
}
```
**Classification rules (ordered by priority):**
| Signal | Tag | Example Values | Maps To |
|--------|-----|---------------|---------|
| ImageType contains "LOCALIZER" | (0008,0008) | `ORIGINAL\PRIMARY\LOCALIZER` | SKIP |
| ImageType contains "DERIVED" | (0008,0008) | `DERIVED\SECONDARY` | SKIP (usually) |
| SliceCount < 10 | computed | | SKIP (scout/cal) |
| ContrastBolusAgent present | (0018,0010) | `Gadavist` | +contrast flag |
| Description contains "flair" | (0008,103E) | `AX T2 FLAIR`, `FLAIR_DARK-FLUID` | T2-FLAIR |
| Description contains "t2" (no flair) | (0008,103E) | `AX T2`, `T2_TSE_TRA` | T2 |
| Description contains "t1" | (0008,103E) | `SAG T1`, `T1_MPRAGE` | T1 or T1+C |
| Description contains "dwi"/"diffusion" | (0008,103E) | `DWI_b1000`, `EP2D_DIFF` | DWI |
| Description contains "adc" | (0008,103E) | `ADC_MAP` | ADC |
| Description contains "swi"/"suscept" | (0008,103E) | `SWI_mIP`, `SUSCEPTIBILITY` | SWI |
| ScanningSequence = "EP" + b-value tag | (0018,0020) | | DWI |
| No match, confidence < 0.7 | | | LLM fallback |
### Clinical Protocol Selection
Given the classified series, select based on the clinical question (or use a general protocol if none specified):
| Clinical Question | Required Series | Optional Series | Expected Count |
|-------------------|-----------------|-----------------|----------------|
| **General screening** | T1, T2, T2-FLAIR | DWI, T1+C | 35 |
| **Hydrocephalus** | T2-FLAIR, T2 | T1, DWI | 23 |
| **Tumor / mass** | T1+C, T2-FLAIR, T1 (pre-contrast) | DWI, SWI | 34 |
| **Stroke / acute** | DWI, ADC, T2-FLAIR | MRA, SWI | 34 |
| **MS / demyelination** | T2-FLAIR, T1+C, T2 | DWI | 3 |
| **Infection** | T1+C, DWI, T2-FLAIR | T2 | 3 |
**Default behavior (no clinical question):** Use "General screening" select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum.
### LLM Fallback
When rule-based classification confidence is below 0.7 for any series:
```
Prompt to Claude Haiku:
You are a neuroradiology MRI series classifier. Given the following DICOM
metadata for a single MRI series, classify it.
SeriesDescription: {desc}
SequenceName: {seq_name}
ScanningSequence: {scan_seq}
MRAcquisitionType: {acq_type}
ImageType: {img_type}
ContrastBolusAgent: {contrast}
SliceCount: {count}
Manufacturer: {mfr}
ManufacturerModelName: {model}
Respond with JSON:
{
"sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER",
"has_contrast": true|false,
"is_diagnostically_relevant": true|false,
"confidence": 0.0-1.0,
"reasoning": "brief explanation"
}
```
**Cost of LLM fallback:** ~$0.001 per series (Haiku). Triggered for maybe 12 series per study. Negligible.
---
## 3. RunPod Serverless Worker
### Worker Architecture
The RunPod serverless worker wraps Prima's `pipeline.py` with an HTTP handler:
```python
# handler.py — RunPod serverless handler
import runpod
import torch
import tempfile
import requests
import shutil
import json
import time
import gc
from pathlib import Path
from pipeline import Pipeline # Prima's pipeline
def handler(job):
"""
Input schema:
{
"input": {
"dicom_url": "https://inou.../signed-download/...",
"series_uids": ["1.2.840...", "1.2.840..."],
"callback_url": "https://inou.../api/analysis/callback",
"callback_hmac_key": "hex-encoded-key",
"job_id": "uuid",
"study_id": "uuid"
}
}
"""
input_data = job["input"]
work_dir = Path(tempfile.mkdtemp())
try:
# 1. Download DICOM (only selected series)
t0 = time.time()
dicom_path = download_dicom(input_data["dicom_url"], work_dir)
# 2. Filter to selected series only
filter_series(dicom_path, input_data["series_uids"])
# 3. Run Prima pipeline
config = build_config(dicom_path, work_dir / "output")
pipeline = Pipeline(config)
# Load study
pipeline.load_mri_study()
# Tokenize (VQ-VAE) — loads tokenizer, runs, then frees
pipeline.load_tokenizer_model()
tokens = pipeline.tokenize_series()
pipeline.free_tokenizer() # Free VRAM
# Run Prima VLM
pipeline.load_prima_model()
results = pipeline.run_inference(tokens)
pipeline.free_prima_model()
elapsed = time.time() - t0
# 4. Build response
response = {
"job_id": input_data["job_id"],
"study_id": input_data["study_id"],
"elapsed_seconds": elapsed,
"series_analyzed": len(input_data["series_uids"]),
"results": {
"diagnoses": results["diagnoses"], # list of {name, probability}
"priority": results["priority"], # STAT / URGENT / ROUTINE
"referral": results["referral"], # specialist recommendation
"differential": results["differential"], # ranked differential diagnosis
}
}
# 5. POST results back to inou
post_callback(input_data["callback_url"], response,
input_data["callback_hmac_key"])
return response
finally:
# 6. ALWAYS purge DICOM data — no PHI left on worker
shutil.rmtree(work_dir, ignore_errors=True)
torch.cuda.empty_cache()
gc.collect()
runpod.serverless.start({"handler": handler})
```
### Worker Lifecycle
```
RunPod Serverless
─────────────────
Idle (no GPU) ──────────────────────── $0.00/sec
Job arrives
Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S)
• Container starts starts billing
• Model weights loaded from
network volume (NFS, ~10s)
• GPU warm
Inference (~30-60s) ─────────────────── $0.00073/sec
• VQ-VAE tokenization (~10-20s)
• Free VQ-VAE
• Prima VLM inference (~20-40s)
• Results callback
Idle timeout (5s) ───────────────────── $0.00073/sec (configurable)
Scale to 0 ──────────────────────────── $0.00/sec
```
### RunPod Configuration
```json
{
"name": "inou-prima-worker",
"gpu": "NVIDIA L40S",
"gpuCount": 1,
"volumeId": "vol_prima_weights",
"volumeMountPath": "/models",
"dockerImage": "inou/prima-worker:latest",
"env": {
"MODEL_DIR": "/models",
"PRIMA_CKPT": "/models/primafullmodel107.pt",
"VQVAE_CKPT": "/models/vqvae_model_step16799.pth"
},
"scalerType": "QUEUE_DELAY",
"scalerValue": 4,
"workersMin": 0,
"workersMax": 3,
"idleTimeout": 5,
"executionTimeout": 300,
"flashboot": true
}
```
**Network Volume:** Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS no download delay after first pull.
---
## 4. API Design
### inou Backend Endpoints
#### 4.1 Trigger Analysis
```
POST /api/studies/{studyID}/analyze
Authorization: Bearer <token>
Request Body (optional):
{
"clinical_question": "evaluate for hydrocephalus", // optional
"priority": "routine", // routine | urgent | stat
"force_series": ["1.2.840..."] // optional: override selection
}
Response 202:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"selected_series": [
{
"series_uid": "1.2.840.113619...",
"description": "AX T2 FLAIR",
"sequence_type": "T2-FLAIR",
"slice_count": 28,
"selection_reason": "Primary sequence for hydrocephalus evaluation"
},
{
"series_uid": "1.2.840.113619...",
"description": "AX T2",
"sequence_type": "T2",
"slice_count": 24,
"selection_reason": "Complementary T2-weighted for ventricular assessment"
}
],
"estimated_cost_usd": 0.066,
"estimated_duration_seconds": 90
}
```
#### 4.2 Analysis Status
```
GET /api/studies/{studyID}/analysis
Authorization: Bearer <token>
Response 200:
{
"job_id": "550e8400-...",
"status": "completed", // queued | processing | completed | failed
"created_at": "2026-02-14T20:15:00Z",
"completed_at": "2026-02-14T20:16:32Z",
"duration_seconds": 92,
"cost_usd": 0.067,
"series_analyzed": 2,
"results": { ... } // see 4.3
}
```
#### 4.3 Analysis Results
```
GET /api/studies/{studyID}/analysis/results
Authorization: Bearer <token>
Response 200:
{
"study_id": "...",
"model": "prima-v1.07",
"model_version": "primafullmodel107",
"analysis_timestamp": "2026-02-14T20:16:32Z",
"diagnoses": [
{
"condition": "Normal pressure hydrocephalus",
"icd10": "G91.2",
"probability": 0.87,
"category": "developmental"
},
{
"condition": "Cerebral atrophy",
"icd10": "G31.9",
"probability": 0.34,
"category": "degenerative"
},
{
"condition": "Periventricular white matter changes",
"icd10": "I67.3",
"probability": 0.28,
"category": "vascular"
}
],
"priority": {
"level": "URGENT",
"reasoning": "High probability hydrocephalus requiring neurosurgical evaluation"
},
"referral": {
"specialty": "Neurosurgery",
"urgency": "within_1_week",
"reasoning": "NPH with high probability — consider VP shunt evaluation"
},
"differential": [
"Normal pressure hydrocephalus",
"Communicating hydrocephalus",
"Cerebral atrophy (ex vacuo ventriculomegaly)"
],
"series_results": [
{
"series_uid": "1.2.840...",
"description": "AX T2 FLAIR",
"findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow."
}
],
"disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation."
}
```
#### 4.4 Callback Endpoint (Worker → inou)
```
POST /api/analysis/callback
X-HMAC-Signature: sha256=<hex>
Content-Type: application/json
{
"job_id": "...",
"study_id": "...",
"status": "completed",
"elapsed_seconds": 47.3,
"series_analyzed": 2,
"results": { ... }
}
```
HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks.
#### 4.5 DICOM Download (Worker → inou)
```
GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2}
→ 200 application/zip (streaming)
Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs.
Only selected series are included in the download — minimizes data transfer.
```
---
## 5. Docker Image
### Dockerfile
```dockerfile
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
# System deps
RUN apt-get update && apt-get install -y \
python3.11 python3.11-dev python3-pip \
libgl1-mesa-glx libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN ln -s /usr/bin/python3.11 /usr/bin/python
# Python deps (cached layer)
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89)
RUN pip install flash-attn==2.7.4.post1 --no-build-isolation
# RunPod SDK
RUN pip install runpod==1.7.0
# Prima source code
COPY Prima/ /app/Prima/
COPY handler.py /app/handler.py
WORKDIR /app
# Model weights are on network volume, not baked into image
# /models/primafullmodel107.pt (~3.5GB)
# /models/vqvae_model_step16799.pth (~500MB)
CMD ["python", "handler.py"]
```
**Image size:** ~12GB (CUDA base + PyTorch + flash-attn + Prima)
### What's in the image vs network volume:
| Component | Location | Size | Rationale |
|-----------|----------|------|-----------|
| CUDA 12.4 + Ubuntu | Docker image | ~4GB | Cached, rarely changes |
| PyTorch 2.6 + deps | Docker image | ~6GB | Cached, rarely changes |
| flash-attn | Docker image | ~500MB | Must match CUDA version |
| Prima source | Docker image | ~50MB | Changes with updates |
| RunPod handler | Docker image | ~5KB | Our wrapper code |
| `primafullmodel107.pt` | Network volume | ~3.5GB | Too large for image, shared |
| `vqvae_model_step16799.pth` | Network volume | ~500MB | Shared across workers |
### Build & Deploy
```bash
# Build
docker build -t inou/prima-worker:latest .
# Push to RunPod's registry (or Docker Hub)
docker push inou/prima-worker:latest
# Network volume setup (one-time)
runpod volume create --name prima-weights --size 10
# Upload model weights to volume via RunPod console or SSH
```
---
## 6. Data Flow & Security
### Data Flow Diagram
```
User uploads inou stores Worker downloads Worker processes
DICOM study encrypted selected series and returns results
at rest (signed URL)
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐
│ Browser │────▶│ inou │─────────▶│ RunPod │────────▶│ inou │
│ Upload │ │ Backend │ signed │ Worker │ HMAC │ Backend │
│ (TLS) │ │ (AES-GCM) │ URL │ (tmpfs) │ callback│ (store) │
└─────────┘ └───────────┘ └──────────┘ └──────────┘
On completion:
shutil.rmtree()
No PHI retained
```
### Security Controls
| Concern | Control |
|---------|---------|
| **DICOM at rest (inou)** | AES-256-GCM encryption in SQLite (existing) |
| **DICOM in transit to worker** | TLS 1.3 + time-limited signed URL (5 min TTL) |
| **DICOM on worker** | Stored in tmpfs; `shutil.rmtree()` in `finally` block; container destroyed after job |
| **Results in transit** | TLS 1.3 + HMAC-SHA256 callback verification |
| **Results at rest** | Encrypted in dossier (existing AES-256-GCM) |
| **PHI in logs** | No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata |
| **RunPod access** | API key stored as inou backend env var, never exposed to frontend |
| **Worker isolation** | Each job runs in isolated container; no shared filesystem between jobs |
### HIPAA Considerations
| HIPAA Requirement | Implementation |
|-------------------|----------------|
| **Access controls** | inou auth (existing); RunPod API key; signed URLs |
| **Encryption** | AES-256-GCM at rest; TLS 1.3 in transit |
| **Audit trail** | All analysis requests logged with timestamp, user, study ID |
| **Minimum necessary** | Only selected series transmitted; no patient demographics sent to worker |
| **BAA** | RunPod offers BAA for serverless **must execute before production** |
| **Data retention** | Zero retention on worker; configurable retention on inou |
| **De-identification** | DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) Series UID sufficient for processing |
### PHI Minimization Pipeline
Before sending DICOM to RunPod, inou strips the following tags:
```go
var phiTagsToStrip = []dicom.Tag{
dicom.PatientName, // (0010,0010)
dicom.PatientID, // (0010,0020)
dicom.PatientBirthDate, // (0010,0030)
dicom.PatientSex, // (0010,0040) — keep if Prima needs it
dicom.PatientAddress, // (0010,1040)
dicom.ReferringPhysician, // (0008,0090)
dicom.InstitutionName, // (0008,0080)
dicom.InstitutionAddress, // (0008,0081)
dicom.AccessionNumber, // (0008,0050)
}
// Pixel data and series-level imaging tags are preserved — Prima needs those
```
---
## 7. Cost Analysis
### RunPod Pricing (L40S Serverless)
| Metric | Value |
|--------|-------|
| Per-second billing | $0.00073/sec |
| Per-minute | $0.0438/min |
| Per-hour | $2.628/hr |
| Minimum charge per job | None (per-second) |
| Network volume (10GB) | ~$1.00/month |
| Idle workers | $0.00 (scale to 0) |
### Per-Series Timing Breakdown
| Phase | Duration | Notes |
|-------|----------|-------|
| Cold start (first job) | 1525s | Container + model load from volume |
| Warm start (subsequent) | 02s | Worker already running |
| DICOM download | 25s | Depends on series size (~50-200MB) |
| VQ-VAE tokenization | 1020s | Per series, depends on slice count |
| VQ-VAE Prima model swap | 23s | Free VQ-VAE, load Prima |
| Prima inference | 1530s | Depends on token count |
| Results callback | <1s | Small JSON payload |
| **Total per series** | **3060s** | **Warm: 3040s typical** |
### Cost Per Study: With vs Without Intelligent Selection
#### Scenario A: Routine Brain MRI (12 series, ~3000 slices)
| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|----------|-----------------|---------------|----------|-------------------|-------|
| **Naive (all series)** | 12 | 360720s | $0.26$0.53 | $0.00 | **$0.26$0.53** |
| **Intelligent (selected)** | 3 | 90180s | $0.066$0.13 | ~$0.003 | **$0.07$0.13** |
| **Savings** | | | | | **7375%** |
#### Scenario B: Brain MRI with Contrast (15 series, ~5000 slices)
| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|----------|-----------------|---------------|----------|-------------------|-------|
| **Naive** | 15 | 450900s | $0.33$0.66 | $0.00 | **$0.33$0.66** |
| **Intelligent** | 4 | 120240s | $0.088$0.18 | ~$0.004 | **$0.09$0.18** |
| **Savings** | | | | | **73%** |
#### Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol)
| Approach | Series Processed | Est. Duration | GPU Cost | Total |
|----------|-----------------|---------------|----------|-------|
| **Naive** | 6 | 180360s | $0.13$0.26 | **$0.13$0.26** |
| **Intelligent** | 3 | 90180s | $0.066$0.13 | **$0.07$0.13** |
### Monthly Volume Projections
| Volume | Naive Cost | Intelligent Cost | Monthly Savings |
|--------|-----------|-----------------|-----------------|
| 10 studies/month | $3.30$6.60 | $0.90$1.80 | $2.40$4.80 |
| 50 studies/month | $16.50$33.00 | $4.50$9.00 | $12.00$24.00 |
| 200 studies/month | $66.00$132.00 | $18.00$36.00 | $48.00$96.00 |
| 1000 studies/month | $330$660 | $90$180 | $240$480 |
**Fixed costs:** RunPod network volume ~$1/month. No other infrastructure costs workers scale to zero.
### Break-Even vs Always-On
An always-on L40S instance costs ~$0.69/hr = **$500/month**.
Break-even point: $500 ÷ $0.13/study = **~3,850 studies/month** before always-on becomes cheaper.
For inou's expected volumes (10200/month), **serverless is 50500× cheaper**.
---
## 8. User Experience
### Upload → Results Flow
```
┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD │
│ │
│ User uploads DICOM study (drag & drop or folder select) │
│ inou parses metadata, displays series list in viewer │
│ "AI Analysis Available" badge appears on brain MRI studies │
│ │
├─────────────────────────────────────────────────────────────┤
│ 2. ANALYSIS INITIATED │
│ │
│ [Analyze with AI] button in viewer toolbar │
│ Optional: clinical question text field │
│ "Analyzing 3 of 12 series... Est. ~90 seconds" │
│ Progress indicator with phases: │
│ ◉ Series selected (3 of 12) │
│ ◉ Uploading to analysis engine... │
│ ○ AI processing... │
│ ○ Results ready │
│ │
├─────────────────────────────────────────────────────────────┤
│ 3. RESULTS DISPLAYED │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ AI Analysis Results Prima v1.07 │ │
│ │ │ │
│ │ ⚠️ URGENT — Neurosurgery referral recommended │ │
│ │ │ │
│ │ Diagnoses: │ │
│ │ ████████████████████░░ 87% NPH │ │
│ │ ██████░░░░░░░░░░░░░░░ 34% Cerebral atrophy │ │
│ │ █████░░░░░░░░░░░░░░░░ 28% WM changes │ │
│ │ │ │
│ │ Referral: Neurosurgery (within 1 week) │ │
│ │ "NPH with high probability — VP shunt evaluation" │ │
│ │ │ │
│ │ Series analyzed: T2 FLAIR, T2 (2 of 12) │ │
│ │ Analysis time: 47s · Cost: $0.03 │ │
│ │ │ │
│ │ ⚕️ AI-assisted analysis — not a radiologist report │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Results panel integrated into existing DICOM viewer │
│ Clicking a diagnosis highlights the relevant series │
│ Results persisted in patient dossier │
└─────────────────────────────────────────────────────────────┘
```
### Automatic vs Manual Analysis
**Option A (default): Manual trigger** User clicks "Analyze with AI" button. Best for initial launch, gives user control.
**Option B (future): Auto-trigger** Analysis starts automatically on upload for brain MRI studies. Configurable in settings.
### Viewer Integration
The AI results panel is a new sidebar component in the existing DICOM viewer:
```
┌──────────────────────────────────────────────────────────────┐
│ inou Viewer [≡] │
├────────────────────────────────────┬─────────────────────────┤
│ │ 📋 Study Info │
│ │ Patient: [redacted] │
│ │ Date: 2026-02-14 │
│ DICOM Image Display │ Series: 12 │
│ │ │
│ (existing viewer canvas) ├─────────────────────────┤
│ │ 🤖 AI Analysis │
│ │ │
│ │ Status: ✅ Complete │
│ │ [Results panel above] │
│ │ │
│ │ [Re-analyze ▼] │
│ │ [Export Report] │
├────────────────────────────────────┴─────────────────────────┤
│ Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450 │
└──────────────────────────────────────────────────────────────┘
```
---
## 9. Error Handling
### Error Categories & Recovery
| Error | Detection | Recovery | User Impact |
|-------|-----------|----------|-------------|
| **RunPod cold start timeout** | No response in 60s | Retry once; if fails, queue for retry in 5 min | "Analysis queued GPU starting up" |
| **Worker execution timeout** | RunPod 300s timeout | Job marked failed; auto-retry with reduced series count | "Analysis timed out retrying with fewer series" |
| **Model inference error** | Exception in handler | Return error in callback; log stack trace | "Analysis failed try again or contact support" |
| **DICOM download failure** | HTTP error / timeout | Retry download 3× with exponential backoff | Transparent to user |
| **Callback delivery failure** | HTTP error | Worker retries 3×; inou polls RunPod status API as fallback | Transparent results arrive via polling |
| **Invalid DICOM study** | Parse errors in ingest | Reject at upload time with specific error | "Unable to parse series unsupported format" |
| **No relevant series found** | Series selector returns empty | Skip analysis; inform user | "No brain MRI sequences detected in this study" |
| **RunPod quota / billing** | 402/429 errors | Alert admin; queue jobs for later | "Analysis temporarily unavailable" |
| **Network volume unavailable** | Model load failure | Worker retries; if persistent, alert | "Analysis service maintenance" |
### Retry Strategy
```go
type RetryPolicy struct {
MaxAttempts int // 3
InitialBackoff time.Duration // 10s
MaxBackoff time.Duration // 5m
BackoffFactor float64 // 2.0
RetryableErrors []string // timeout, 5xx, connection_refused
}
```
### Monitoring & Alerting
| Metric | Alert Threshold | Channel |
|--------|----------------|---------|
| Job failure rate | >10% in 1 hour | Slack / Signal |
| Median latency | >180s | Dashboard |
| RunPod spend | >$50/day | Email |
| Cold start frequency | >50% of jobs | Dashboard (indicates insufficient idle timeout) |
| Callback failures | Any | Immediate alert |
---
## 10. Implementation Plan
### Phase 1: Foundation (2 weeks)
- [ ] Fork Prima repo, create `inou-prima-worker` repo
- [ ] Write RunPod handler (`handler.py`)
- [ ] Build Docker image, test locally with `nvidia-docker`
- [ ] Set up RunPod account, network volume, deploy worker
- [ ] Test end-to-end with a sample DICOM study
- [ ] Verify model outputs match Prima's reference results
### Phase 2: Series Selection (1 week)
- [ ] Implement rule-based series classifier in Go
- [ ] Build clinical protocol selection logic
- [ ] Add LLM fallback (Claude Haiku integration)
- [ ] Test with diverse DICOM studies (Siemens, GE, Philips naming conventions)
- [ ] Validate selection accuracy: should match radiologist series choice >90%
### Phase 3: Backend Integration (2 weeks)
- [ ] Add analysis endpoints to inou backend
- [ ] Implement job queue with retry logic
- [ ] Build DICOM download endpoint with signed URLs
- [ ] Implement callback handler with HMAC verification
- [ ] Add PHI stripping to DICOM export pipeline
- [ ] Store results in dossier (encrypted)
- [ ] Write integration tests
### Phase 4: Frontend Integration (1 week)
- [ ] Add "Analyze with AI" button to viewer
- [ ] Build AI results panel component
- [ ] Implement progress indicator (WebSocket or polling)
- [ ] Add results to dossier view
- [ ] Export report functionality
### Phase 5: Hardening (1 week)
- [ ] Load testing (concurrent jobs, cold starts)
- [ ] Error handling for all failure modes
- [ ] Monitoring dashboard
- [ ] Cost tracking and alerting
- [ ] Security review (PHI flow, access controls)
- [ ] Documentation
**Total: ~7 weeks** to production-ready MVP.
---
## 11. Future Work
### Near-Term (36 months)
- **Auto-trigger analysis** on brain MRI upload (configurable)
- **Batch processing** — analyze multiple studies in parallel
- **Result caching** — skip re-analysis if study hasn't changed
- **Comparison mode** — compare current vs prior study results
### Medium-Term (612 months)
- **Fine-tuning on specific conditions** — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case)
- **Radiologist feedback loop** — Allow radiologists to confirm/reject AI findings, feeding back into training data
- **Multi-series reasoning** — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1)
### Long-Term (12+ months)
- **Multi-organ expansion** — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture
- **On-premise deployment** — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU)
- **Real-time inference** — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration)
- **Clinical decision support** — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations
- **FDA 510(k) pathway** — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device
### Architecture Extensibility
The serverless worker pattern is model-agnostic. To add a new model:
1. Build a new Docker image with the model
2. Deploy as a separate RunPod serverless endpoint
3. Add a new series selector profile
4. Route from inou backend based on study type / body part
```
inou Backend
┌──────────┼──────────┐
▼ ▼ ▼
Prima SpineVLM CardiacAI
(Brain) (Spine) (Heart)
L40S L40S A100
```
Same queue, same callback pattern, same dossier storage. Only the worker and selector change.
---
## Appendix A: Prima Model Details
| Property | Value |
|----------|-------|
| **Paper** | "Learning neuroimaging models from health system-scale data" (arXiv:2509.18638) |
| **Training data** | UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images |
| **Architecture** | Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer |
| **Diagnoses covered** | 52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental |
| **Mean AUC** | 90.1 ± 5.0% (prospective 30K study validation) |
| **License** | MIT |
| **GPU requirement** | Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100) |
| **Weights** | `primafullmodel107.pt` (~3.5GB) + `vqvae_model_step16799.pth` (~500MB) |
| **Dependencies** | PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49 |
## Appendix B: DICOM Tags Reference
```
(0008,0008) ImageType — ORIGINAL\PRIMARY vs DERIVED\SECONDARY
(0008,0060) Modality — MR
(0008,103E) SeriesDescription — Free text, scanner-dependent
(0010,0010) PatientName — PHI — STRIP before sending to worker
(0010,0020) PatientID — PHI — STRIP
(0010,0030) PatientBirthDate — PHI — STRIP
(0018,0010) ContrastBolusAgent — Present = contrast-enhanced
(0018,0020) ScanningSequence — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar)
(0018,0023) MRAcquisitionType — 2D, 3D
(0018,0024) SequenceName — Scanner-specific pulse sequence name
(0018,0050) SliceThickness — mm
(0020,0011) SeriesNumber — Integer ordering
(0020,0013) InstanceNumber — Slice index within series
(0028,0010) Rows — Pixel rows
(0028,0011) Columns — Pixel columns
```
---
*This specification is a living document. Update as implementation progresses and requirements evolve.*