998 lines
43 KiB
Markdown
998 lines
43 KiB
Markdown
# Prima Integration — Technical Specification
|
||
|
||
**inou Health × Prima Brain MRI VLM**
|
||
**Version:** 1.0 · **Date:** 2026-02-14 · **Author:** James (AI Architect) · **Reviewer:** Johan Jongsma
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
This spec defines how inou integrates [Prima](https://github.com/MLNeurosurg/Prima), the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations.
|
||
|
||
The design prioritizes **cost efficiency** (serverless GPU, intelligent series selection), **security** (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and **seamless UX** (upload → automatic analysis → results in viewer).
|
||
|
||
**Key numbers:**
|
||
- Per-study cost with intelligent selection: **$0.04–$0.13** (vs $0.22–$0.66 naive)
|
||
- Cold start to results: **60–120 seconds**
|
||
- Zero always-on GPU cost
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [System Architecture](#1-system-architecture)
|
||
2. [Intelligent Series Selection](#2-intelligent-series-selection)
|
||
3. [RunPod Serverless Worker](#3-runpod-serverless-worker)
|
||
4. [API Design](#4-api-design)
|
||
5. [Docker Image](#5-docker-image)
|
||
6. [Data Flow & Security](#6-data-flow--security)
|
||
7. [Cost Analysis](#7-cost-analysis)
|
||
8. [User Experience](#8-user-experience)
|
||
9. [Error Handling](#9-error-handling)
|
||
10. [Implementation Plan](#10-implementation-plan)
|
||
11. [Future Work](#11-future-work)
|
||
|
||
---
|
||
|
||
## 1. System Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ inou Frontend │
|
||
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────────────────┐ │
|
||
│ │ DICOM │ │ DICOM │ │ AI Results Panel │ │
|
||
│ │ Upload │──│ Viewer │ │ • Diagnosis probabilities │ │
|
||
│ │ │ │ │ │ • Urgency / priority │ │
|
||
│ └────┬─────┘ └──────────────┘ │ • Specialist referral │ │
|
||
│ │ │ • Series-level annotations │ │
|
||
│ │ └──────────┬────────────────────┘ │
|
||
└───────┼─────────────────────────────────────┼─────────────────────┘
|
||
│ POST /api/studies │ GET /api/studies/:id/analysis
|
||
▼ ▲
|
||
┌───────────────────────────────────────────────────────────────────┐
|
||
│ inou Backend (Go) │
|
||
│ │
|
||
│ ┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌──────────────┐ │
|
||
│ │ DICOM │ │ Series │ │ Job │ │ Dossier │ │
|
||
│ │ Ingest │─▶│ Selector │─▶│ Queue │ │ Store │ │
|
||
│ │ + Parse │ │ (LLM/Rules)│ │ │ │ (SQLite+AES)│ │
|
||
│ └──────────┘ └─────────────┘ └────┬─────┘ └──────▲───────┘ │
|
||
│ │ │ │
|
||
└──────────────────────────────────────┼────────────────┼──────────┘
|
||
│ │
|
||
Job dispatch Results callback
|
||
(HTTPS POST) (HTTPS POST + HMAC)
|
||
│ │
|
||
▼ │
|
||
┌──────────────────────────────────────────┐
|
||
│ RunPod Serverless │
|
||
│ │
|
||
│ ┌────────────────────────────────────┐ │
|
||
│ │ Prima Worker (L40S 48GB) │ │
|
||
│ │ │ │
|
||
│ │ 1. Download DICOM from signed URL │ │
|
||
│ │ 2. Load VQ-VAE tokenizer │ │
|
||
│ │ 3. Tokenize selected series │ │
|
||
│ │ 4. Free VQ-VAE from VRAM │ │
|
||
│ │ 5. Load Prima VLM │ │
|
||
│ │ 6. Run inference (fp16) │ │
|
||
│ │ 7. POST results back to inou │ │
|
||
│ │ 8. Purge all DICOM data │ │
|
||
│ │ │ │
|
||
│ └────────────────────────────────────┘ │
|
||
│ │
|
||
│ GPU: L40S 48GB · Scales 0→N · Pay/sec │
|
||
└──────────────────────────────────────────┘
|
||
```
|
||
|
||
### Component Responsibilities
|
||
|
||
| Component | Technology | Responsibility |
|
||
|-----------|-----------|----------------|
|
||
| **DICOM Ingest** | Go | Parse uploaded DICOM, extract metadata, store encrypted |
|
||
| **Series Selector** | Go + LLM (Claude Haiku) | Analyze DICOM metadata, select diagnostically relevant series |
|
||
| **Job Queue** | Go (in-process) | Manage analysis jobs, retry logic, status tracking |
|
||
| **RunPod Worker** | Python 3.11, PyTorch 2.6 | Run Prima inference on selected series |
|
||
| **Dossier Store** | SQLite + AES-256-GCM | Store results encrypted at rest |
|
||
| **AI Results Panel** | Frontend (existing viewer) | Display diagnosis, urgency, referral |
|
||
|
||
---
|
||
|
||
## 2. Intelligent Series Selection
|
||
|
||
### The Problem
|
||
|
||
A typical brain MRI study contains **8–15 series** with **100–500 slices each**, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 2–4 specific sequence types.
|
||
|
||
### Selection Architecture
|
||
|
||
```
|
||
DICOM Study Metadata
|
||
│
|
||
▼
|
||
┌───────────────────────────────────────────┐
|
||
│ Series Selection Pipeline │
|
||
│ │
|
||
│ Step 1: Extract metadata per series │
|
||
│ • SeriesDescription (0008,103E) │
|
||
│ • SequenceName (0018,0024) │
|
||
│ • MRAcquisitionType (0018,0023) │
|
||
│ • ScanningSequence (0018,0020) │
|
||
│ • ContrastBolusAgent (0018,0010) │
|
||
│ • SliceThickness (0018,0050) │
|
||
│ • NumberOfSlices │
|
||
│ • ImageType (0008,0008) │
|
||
│ • Modality (0008,0060) │
|
||
│ │
|
||
│ Step 2: Rule-based classification │
|
||
│ Map each series → sequence type: │
|
||
│ T1, T1+C, T2, T2-FLAIR, DWI, ADC, │
|
||
│ SWI, MRA, SCOUT, LOCALIZER, DERIVED │
|
||
│ │
|
||
│ Step 3: Clinical protocol matching │
|
||
│ Apply selection rules (see table) │
|
||
│ │
|
||
│ Step 4: LLM fallback for ambiguous cases │
|
||
│ If rule-based classification fails, │
|
||
│ send metadata to Claude Haiku │
|
||
│ │
|
||
│ Output: ordered list of series to analyze │
|
||
└───────────────────────────────────────────┘
|
||
```
|
||
|
||
### Rule-Based Classification
|
||
|
||
Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach:
|
||
|
||
```go
|
||
type SeriesClassification struct {
|
||
SeriesUID string
|
||
SequenceType string // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER
|
||
HasContrast bool
|
||
IsLocalizer bool
|
||
IsDerived bool
|
||
SliceCount int
|
||
Confidence float64 // 0.0-1.0, below 0.7 triggers LLM fallback
|
||
}
|
||
```
|
||
|
||
**Classification rules (ordered by priority):**
|
||
|
||
| Signal | Tag | Example Values | Maps To |
|
||
|--------|-----|---------------|---------|
|
||
| ImageType contains "LOCALIZER" | (0008,0008) | `ORIGINAL\PRIMARY\LOCALIZER` | SKIP |
|
||
| ImageType contains "DERIVED" | (0008,0008) | `DERIVED\SECONDARY` | SKIP (usually) |
|
||
| SliceCount < 10 | computed | — | SKIP (scout/cal) |
|
||
| ContrastBolusAgent present | (0018,0010) | `Gadavist` | +contrast flag |
|
||
| Description contains "flair" | (0008,103E) | `AX T2 FLAIR`, `FLAIR_DARK-FLUID` | T2-FLAIR |
|
||
| Description contains "t2" (no flair) | (0008,103E) | `AX T2`, `T2_TSE_TRA` | T2 |
|
||
| Description contains "t1" | (0008,103E) | `SAG T1`, `T1_MPRAGE` | T1 or T1+C |
|
||
| Description contains "dwi"/"diffusion" | (0008,103E) | `DWI_b1000`, `EP2D_DIFF` | DWI |
|
||
| Description contains "adc" | (0008,103E) | `ADC_MAP` | ADC |
|
||
| Description contains "swi"/"suscept" | (0008,103E) | `SWI_mIP`, `SUSCEPTIBILITY` | SWI |
|
||
| ScanningSequence = "EP" + b-value tag | (0018,0020) | — | DWI |
|
||
| No match, confidence < 0.7 | — | — | → LLM fallback |
|
||
|
||
### Clinical Protocol Selection
|
||
|
||
Given the classified series, select based on the clinical question (or use a general protocol if none specified):
|
||
|
||
| Clinical Question | Required Series | Optional Series | Expected Count |
|
||
|-------------------|-----------------|-----------------|----------------|
|
||
| **General screening** | T1, T2, T2-FLAIR | DWI, T1+C | 3–5 |
|
||
| **Hydrocephalus** | T2-FLAIR, T2 | T1, DWI | 2–3 |
|
||
| **Tumor / mass** | T1+C, T2-FLAIR, T1 (pre-contrast) | DWI, SWI | 3–4 |
|
||
| **Stroke / acute** | DWI, ADC, T2-FLAIR | MRA, SWI | 3–4 |
|
||
| **MS / demyelination** | T2-FLAIR, T1+C, T2 | DWI | 3 |
|
||
| **Infection** | T1+C, DWI, T2-FLAIR | T2 | 3 |
|
||
|
||
**Default behavior (no clinical question):** Use "General screening" — select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum.
|
||
|
||
### LLM Fallback
|
||
|
||
When rule-based classification confidence is below 0.7 for any series:
|
||
|
||
```
|
||
Prompt to Claude Haiku:
|
||
|
||
You are a neuroradiology MRI series classifier. Given the following DICOM
|
||
metadata for a single MRI series, classify it.
|
||
|
||
SeriesDescription: {desc}
|
||
SequenceName: {seq_name}
|
||
ScanningSequence: {scan_seq}
|
||
MRAcquisitionType: {acq_type}
|
||
ImageType: {img_type}
|
||
ContrastBolusAgent: {contrast}
|
||
SliceCount: {count}
|
||
Manufacturer: {mfr}
|
||
ManufacturerModelName: {model}
|
||
|
||
Respond with JSON:
|
||
{
|
||
"sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER",
|
||
"has_contrast": true|false,
|
||
"is_diagnostically_relevant": true|false,
|
||
"confidence": 0.0-1.0,
|
||
"reasoning": "brief explanation"
|
||
}
|
||
```
|
||
|
||
**Cost of LLM fallback:** ~$0.001 per series (Haiku). Triggered for maybe 1–2 series per study. Negligible.
|
||
|
||
---
|
||
|
||
## 3. RunPod Serverless Worker
|
||
|
||
### Worker Architecture
|
||
|
||
The RunPod serverless worker wraps Prima's `pipeline.py` with an HTTP handler:
|
||
|
||
```python
|
||
# handler.py — RunPod serverless handler
|
||
import runpod
|
||
import torch
|
||
import tempfile
|
||
import requests
|
||
import shutil
|
||
import json
|
||
import time
|
||
import gc
|
||
from pathlib import Path
|
||
|
||
from pipeline import Pipeline # Prima's pipeline
|
||
|
||
def handler(job):
|
||
"""
|
||
Input schema:
|
||
{
|
||
"input": {
|
||
"dicom_url": "https://inou.../signed-download/...",
|
||
"series_uids": ["1.2.840...", "1.2.840..."],
|
||
"callback_url": "https://inou.../api/analysis/callback",
|
||
"callback_hmac_key": "hex-encoded-key",
|
||
"job_id": "uuid",
|
||
"study_id": "uuid"
|
||
}
|
||
}
|
||
"""
|
||
input_data = job["input"]
|
||
work_dir = Path(tempfile.mkdtemp())
|
||
|
||
try:
|
||
# 1. Download DICOM (only selected series)
|
||
t0 = time.time()
|
||
dicom_path = download_dicom(input_data["dicom_url"], work_dir)
|
||
|
||
# 2. Filter to selected series only
|
||
filter_series(dicom_path, input_data["series_uids"])
|
||
|
||
# 3. Run Prima pipeline
|
||
config = build_config(dicom_path, work_dir / "output")
|
||
pipeline = Pipeline(config)
|
||
|
||
# Load study
|
||
pipeline.load_mri_study()
|
||
|
||
# Tokenize (VQ-VAE) — loads tokenizer, runs, then frees
|
||
pipeline.load_tokenizer_model()
|
||
tokens = pipeline.tokenize_series()
|
||
pipeline.free_tokenizer() # Free VRAM
|
||
|
||
# Run Prima VLM
|
||
pipeline.load_prima_model()
|
||
results = pipeline.run_inference(tokens)
|
||
pipeline.free_prima_model()
|
||
|
||
elapsed = time.time() - t0
|
||
|
||
# 4. Build response
|
||
response = {
|
||
"job_id": input_data["job_id"],
|
||
"study_id": input_data["study_id"],
|
||
"elapsed_seconds": elapsed,
|
||
"series_analyzed": len(input_data["series_uids"]),
|
||
"results": {
|
||
"diagnoses": results["diagnoses"], # list of {name, probability}
|
||
"priority": results["priority"], # STAT / URGENT / ROUTINE
|
||
"referral": results["referral"], # specialist recommendation
|
||
"differential": results["differential"], # ranked differential diagnosis
|
||
}
|
||
}
|
||
|
||
# 5. POST results back to inou
|
||
post_callback(input_data["callback_url"], response,
|
||
input_data["callback_hmac_key"])
|
||
|
||
return response
|
||
|
||
finally:
|
||
# 6. ALWAYS purge DICOM data — no PHI left on worker
|
||
shutil.rmtree(work_dir, ignore_errors=True)
|
||
torch.cuda.empty_cache()
|
||
gc.collect()
|
||
|
||
runpod.serverless.start({"handler": handler})
|
||
```
|
||
|
||
### Worker Lifecycle
|
||
|
||
```
|
||
RunPod Serverless
|
||
─────────────────
|
||
Idle (no GPU) ──────────────────────── $0.00/sec
|
||
│
|
||
Job arrives
|
||
│
|
||
▼
|
||
Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S)
|
||
• Container starts starts billing
|
||
• Model weights loaded from
|
||
network volume (NFS, ~10s)
|
||
• GPU warm
|
||
│
|
||
▼
|
||
Inference (~30-60s) ─────────────────── $0.00073/sec
|
||
• VQ-VAE tokenization (~10-20s)
|
||
• Free VQ-VAE
|
||
• Prima VLM inference (~20-40s)
|
||
• Results callback
|
||
│
|
||
▼
|
||
Idle timeout (5s) ───────────────────── $0.00073/sec (configurable)
|
||
│
|
||
▼
|
||
Scale to 0 ──────────────────────────── $0.00/sec
|
||
```
|
||
|
||
### RunPod Configuration
|
||
|
||
```json
|
||
{
|
||
"name": "inou-prima-worker",
|
||
"gpu": "NVIDIA L40S",
|
||
"gpuCount": 1,
|
||
"volumeId": "vol_prima_weights",
|
||
"volumeMountPath": "/models",
|
||
"dockerImage": "inou/prima-worker:latest",
|
||
"env": {
|
||
"MODEL_DIR": "/models",
|
||
"PRIMA_CKPT": "/models/primafullmodel107.pt",
|
||
"VQVAE_CKPT": "/models/vqvae_model_step16799.pth"
|
||
},
|
||
"scalerType": "QUEUE_DELAY",
|
||
"scalerValue": 4,
|
||
"workersMin": 0,
|
||
"workersMax": 3,
|
||
"idleTimeout": 5,
|
||
"executionTimeout": 300,
|
||
"flashboot": true
|
||
}
|
||
```
|
||
|
||
**Network Volume:** Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS — no download delay after first pull.
|
||
|
||
---
|
||
|
||
## 4. API Design
|
||
|
||
### inou Backend Endpoints
|
||
|
||
#### 4.1 Trigger Analysis
|
||
|
||
```
|
||
POST /api/studies/{studyID}/analyze
|
||
Authorization: Bearer <token>
|
||
|
||
Request Body (optional):
|
||
{
|
||
"clinical_question": "evaluate for hydrocephalus", // optional
|
||
"priority": "routine", // routine | urgent | stat
|
||
"force_series": ["1.2.840..."] // optional: override selection
|
||
}
|
||
|
||
Response 202:
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "queued",
|
||
"selected_series": [
|
||
{
|
||
"series_uid": "1.2.840.113619...",
|
||
"description": "AX T2 FLAIR",
|
||
"sequence_type": "T2-FLAIR",
|
||
"slice_count": 28,
|
||
"selection_reason": "Primary sequence for hydrocephalus evaluation"
|
||
},
|
||
{
|
||
"series_uid": "1.2.840.113619...",
|
||
"description": "AX T2",
|
||
"sequence_type": "T2",
|
||
"slice_count": 24,
|
||
"selection_reason": "Complementary T2-weighted for ventricular assessment"
|
||
}
|
||
],
|
||
"estimated_cost_usd": 0.066,
|
||
"estimated_duration_seconds": 90
|
||
}
|
||
```
|
||
|
||
#### 4.2 Analysis Status
|
||
|
||
```
|
||
GET /api/studies/{studyID}/analysis
|
||
Authorization: Bearer <token>
|
||
|
||
Response 200:
|
||
{
|
||
"job_id": "550e8400-...",
|
||
"status": "completed", // queued | processing | completed | failed
|
||
"created_at": "2026-02-14T20:15:00Z",
|
||
"completed_at": "2026-02-14T20:16:32Z",
|
||
"duration_seconds": 92,
|
||
"cost_usd": 0.067,
|
||
"series_analyzed": 2,
|
||
"results": { ... } // see 4.3
|
||
}
|
||
```
|
||
|
||
#### 4.3 Analysis Results
|
||
|
||
```
|
||
GET /api/studies/{studyID}/analysis/results
|
||
Authorization: Bearer <token>
|
||
|
||
Response 200:
|
||
{
|
||
"study_id": "...",
|
||
"model": "prima-v1.07",
|
||
"model_version": "primafullmodel107",
|
||
"analysis_timestamp": "2026-02-14T20:16:32Z",
|
||
|
||
"diagnoses": [
|
||
{
|
||
"condition": "Normal pressure hydrocephalus",
|
||
"icd10": "G91.2",
|
||
"probability": 0.87,
|
||
"category": "developmental"
|
||
},
|
||
{
|
||
"condition": "Cerebral atrophy",
|
||
"icd10": "G31.9",
|
||
"probability": 0.34,
|
||
"category": "degenerative"
|
||
},
|
||
{
|
||
"condition": "Periventricular white matter changes",
|
||
"icd10": "I67.3",
|
||
"probability": 0.28,
|
||
"category": "vascular"
|
||
}
|
||
],
|
||
|
||
"priority": {
|
||
"level": "URGENT",
|
||
"reasoning": "High probability hydrocephalus requiring neurosurgical evaluation"
|
||
},
|
||
|
||
"referral": {
|
||
"specialty": "Neurosurgery",
|
||
"urgency": "within_1_week",
|
||
"reasoning": "NPH with high probability — consider VP shunt evaluation"
|
||
},
|
||
|
||
"differential": [
|
||
"Normal pressure hydrocephalus",
|
||
"Communicating hydrocephalus",
|
||
"Cerebral atrophy (ex vacuo ventriculomegaly)"
|
||
],
|
||
|
||
"series_results": [
|
||
{
|
||
"series_uid": "1.2.840...",
|
||
"description": "AX T2 FLAIR",
|
||
"findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow."
|
||
}
|
||
],
|
||
|
||
"disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation."
|
||
}
|
||
```
|
||
|
||
#### 4.4 Callback Endpoint (Worker → inou)
|
||
|
||
```
|
||
POST /api/analysis/callback
|
||
X-HMAC-Signature: sha256=<hex>
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"job_id": "...",
|
||
"study_id": "...",
|
||
"status": "completed",
|
||
"elapsed_seconds": 47.3,
|
||
"series_analyzed": 2,
|
||
"results": { ... }
|
||
}
|
||
```
|
||
|
||
HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks.
|
||
|
||
#### 4.5 DICOM Download (Worker → inou)
|
||
|
||
```
|
||
GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2}
|
||
→ 200 application/zip (streaming)
|
||
|
||
Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs.
|
||
Only selected series are included in the download — minimizes data transfer.
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Docker Image
|
||
|
||
### Dockerfile
|
||
|
||
```dockerfile
|
||
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
|
||
|
||
# System deps
|
||
RUN apt-get update && apt-get install -y \
|
||
python3.11 python3.11-dev python3-pip \
|
||
libgl1-mesa-glx libglib2.0-0 \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
RUN ln -s /usr/bin/python3.11 /usr/bin/python
|
||
|
||
# Python deps (cached layer)
|
||
COPY requirements.txt /tmp/
|
||
RUN pip install --no-cache-dir -r /tmp/requirements.txt
|
||
|
||
# flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89)
|
||
RUN pip install flash-attn==2.7.4.post1 --no-build-isolation
|
||
|
||
# RunPod SDK
|
||
RUN pip install runpod==1.7.0
|
||
|
||
# Prima source code
|
||
COPY Prima/ /app/Prima/
|
||
COPY handler.py /app/handler.py
|
||
|
||
WORKDIR /app
|
||
|
||
# Model weights are on network volume, not baked into image
|
||
# /models/primafullmodel107.pt (~3.5GB)
|
||
# /models/vqvae_model_step16799.pth (~500MB)
|
||
|
||
CMD ["python", "handler.py"]
|
||
```
|
||
|
||
**Image size:** ~12GB (CUDA base + PyTorch + flash-attn + Prima)
|
||
|
||
### What's in the image vs network volume:
|
||
|
||
| Component | Location | Size | Rationale |
|
||
|-----------|----------|------|-----------|
|
||
| CUDA 12.4 + Ubuntu | Docker image | ~4GB | Cached, rarely changes |
|
||
| PyTorch 2.6 + deps | Docker image | ~6GB | Cached, rarely changes |
|
||
| flash-attn | Docker image | ~500MB | Must match CUDA version |
|
||
| Prima source | Docker image | ~50MB | Changes with updates |
|
||
| RunPod handler | Docker image | ~5KB | Our wrapper code |
|
||
| `primafullmodel107.pt` | Network volume | ~3.5GB | Too large for image, shared |
|
||
| `vqvae_model_step16799.pth` | Network volume | ~500MB | Shared across workers |
|
||
|
||
### Build & Deploy
|
||
|
||
```bash
|
||
# Build
|
||
docker build -t inou/prima-worker:latest .
|
||
|
||
# Push to RunPod's registry (or Docker Hub)
|
||
docker push inou/prima-worker:latest
|
||
|
||
# Network volume setup (one-time)
|
||
runpod volume create --name prima-weights --size 10
|
||
# Upload model weights to volume via RunPod console or SSH
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Data Flow & Security
|
||
|
||
### Data Flow Diagram
|
||
|
||
```
|
||
User uploads inou stores Worker downloads Worker processes
|
||
DICOM study encrypted selected series and returns results
|
||
at rest (signed URL)
|
||
│ │ │ │
|
||
▼ ▼ ▼ ▼
|
||
┌─────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐
|
||
│ Browser │────▶│ inou │─────────▶│ RunPod │────────▶│ inou │
|
||
│ Upload │ │ Backend │ signed │ Worker │ HMAC │ Backend │
|
||
│ (TLS) │ │ (AES-GCM) │ URL │ (tmpfs) │ callback│ (store) │
|
||
└─────────┘ └───────────┘ └──────────┘ └──────────┘
|
||
│
|
||
On completion:
|
||
shutil.rmtree()
|
||
No PHI retained
|
||
```
|
||
|
||
### Security Controls
|
||
|
||
| Concern | Control |
|
||
|---------|---------|
|
||
| **DICOM at rest (inou)** | AES-256-GCM encryption in SQLite (existing) |
|
||
| **DICOM in transit to worker** | TLS 1.3 + time-limited signed URL (5 min TTL) |
|
||
| **DICOM on worker** | Stored in tmpfs; `shutil.rmtree()` in `finally` block; container destroyed after job |
|
||
| **Results in transit** | TLS 1.3 + HMAC-SHA256 callback verification |
|
||
| **Results at rest** | Encrypted in dossier (existing AES-256-GCM) |
|
||
| **PHI in logs** | No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata |
|
||
| **RunPod access** | API key stored as inou backend env var, never exposed to frontend |
|
||
| **Worker isolation** | Each job runs in isolated container; no shared filesystem between jobs |
|
||
|
||
### HIPAA Considerations
|
||
|
||
| HIPAA Requirement | Implementation |
|
||
|-------------------|----------------|
|
||
| **Access controls** | inou auth (existing); RunPod API key; signed URLs |
|
||
| **Encryption** | AES-256-GCM at rest; TLS 1.3 in transit |
|
||
| **Audit trail** | All analysis requests logged with timestamp, user, study ID |
|
||
| **Minimum necessary** | Only selected series transmitted; no patient demographics sent to worker |
|
||
| **BAA** | RunPod offers BAA for serverless — **must execute before production** |
|
||
| **Data retention** | Zero retention on worker; configurable retention on inou |
|
||
| **De-identification** | DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) — Series UID sufficient for processing |
|
||
|
||
### PHI Minimization Pipeline
|
||
|
||
Before sending DICOM to RunPod, inou strips the following tags:
|
||
|
||
```go
|
||
var phiTagsToStrip = []dicom.Tag{
|
||
dicom.PatientName, // (0010,0010)
|
||
dicom.PatientID, // (0010,0020)
|
||
dicom.PatientBirthDate, // (0010,0030)
|
||
dicom.PatientSex, // (0010,0040) — keep if Prima needs it
|
||
dicom.PatientAddress, // (0010,1040)
|
||
dicom.ReferringPhysician, // (0008,0090)
|
||
dicom.InstitutionName, // (0008,0080)
|
||
dicom.InstitutionAddress, // (0008,0081)
|
||
dicom.AccessionNumber, // (0008,0050)
|
||
}
|
||
// Pixel data and series-level imaging tags are preserved — Prima needs those
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Cost Analysis
|
||
|
||
### RunPod Pricing (L40S Serverless)
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Per-second billing | $0.00073/sec |
|
||
| Per-minute | $0.0438/min |
|
||
| Per-hour | $2.628/hr |
|
||
| Minimum charge per job | None (per-second) |
|
||
| Network volume (10GB) | ~$1.00/month |
|
||
| Idle workers | $0.00 (scale to 0) |
|
||
|
||
### Per-Series Timing Breakdown
|
||
|
||
| Phase | Duration | Notes |
|
||
|-------|----------|-------|
|
||
| Cold start (first job) | 15–25s | Container + model load from volume |
|
||
| Warm start (subsequent) | 0–2s | Worker already running |
|
||
| DICOM download | 2–5s | Depends on series size (~50-200MB) |
|
||
| VQ-VAE tokenization | 10–20s | Per series, depends on slice count |
|
||
| VQ-VAE → Prima model swap | 2–3s | Free VQ-VAE, load Prima |
|
||
| Prima inference | 15–30s | Depends on token count |
|
||
| Results callback | <1s | Small JSON payload |
|
||
| **Total per series** | **30–60s** | **Warm: 30–40s typical** |
|
||
|
||
### Cost Per Study: With vs Without Intelligent Selection
|
||
|
||
#### Scenario A: Routine Brain MRI (12 series, ~3000 slices)
|
||
|
||
| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|
||
|----------|-----------------|---------------|----------|-------------------|-------|
|
||
| **Naive (all series)** | 12 | 360–720s | $0.26–$0.53 | $0.00 | **$0.26–$0.53** |
|
||
| **Intelligent (selected)** | 3 | 90–180s | $0.066–$0.13 | ~$0.003 | **$0.07–$0.13** |
|
||
| **Savings** | — | — | — | — | **73–75%** |
|
||
|
||
#### Scenario B: Brain MRI with Contrast (15 series, ~5000 slices)
|
||
|
||
| Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total |
|
||
|----------|-----------------|---------------|----------|-------------------|-------|
|
||
| **Naive** | 15 | 450–900s | $0.33–$0.66 | $0.00 | **$0.33–$0.66** |
|
||
| **Intelligent** | 4 | 120–240s | $0.088–$0.18 | ~$0.004 | **$0.09–$0.18** |
|
||
| **Savings** | — | — | — | — | **73%** |
|
||
|
||
#### Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol)
|
||
|
||
| Approach | Series Processed | Est. Duration | GPU Cost | Total |
|
||
|----------|-----------------|---------------|----------|-------|
|
||
| **Naive** | 6 | 180–360s | $0.13–$0.26 | **$0.13–$0.26** |
|
||
| **Intelligent** | 3 | 90–180s | $0.066–$0.13 | **$0.07–$0.13** |
|
||
|
||
### Monthly Volume Projections
|
||
|
||
| Volume | Naive Cost | Intelligent Cost | Monthly Savings |
|
||
|--------|-----------|-----------------|-----------------|
|
||
| 10 studies/month | $3.30–$6.60 | $0.90–$1.80 | $2.40–$4.80 |
|
||
| 50 studies/month | $16.50–$33.00 | $4.50–$9.00 | $12.00–$24.00 |
|
||
| 200 studies/month | $66.00–$132.00 | $18.00–$36.00 | $48.00–$96.00 |
|
||
| 1000 studies/month | $330–$660 | $90–$180 | $240–$480 |
|
||
|
||
**Fixed costs:** RunPod network volume ~$1/month. No other infrastructure costs — workers scale to zero.
|
||
|
||
### Break-Even vs Always-On
|
||
|
||
An always-on L40S instance costs ~$0.69/hr = **$500/month**.
|
||
|
||
Break-even point: $500 ÷ $0.13/study = **~3,850 studies/month** before always-on becomes cheaper.
|
||
|
||
For inou's expected volumes (10–200/month), **serverless is 50–500× cheaper**.
|
||
|
||
---
|
||
|
||
## 8. User Experience
|
||
|
||
### Upload → Results Flow
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ 1. UPLOAD │
|
||
│ │
|
||
│ User uploads DICOM study (drag & drop or folder select) │
|
||
│ inou parses metadata, displays series list in viewer │
|
||
│ "AI Analysis Available" badge appears on brain MRI studies │
|
||
│ │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ 2. ANALYSIS INITIATED │
|
||
│ │
|
||
│ [Analyze with AI] button in viewer toolbar │
|
||
│ Optional: clinical question text field │
|
||
│ "Analyzing 3 of 12 series... Est. ~90 seconds" │
|
||
│ Progress indicator with phases: │
|
||
│ ◉ Series selected (3 of 12) │
|
||
│ ◉ Uploading to analysis engine... │
|
||
│ ○ AI processing... │
|
||
│ ○ Results ready │
|
||
│ │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ 3. RESULTS DISPLAYED │
|
||
│ │
|
||
│ ┌─────────────────────────────────────────────────────┐ │
|
||
│ │ AI Analysis Results Prima v1.07 │ │
|
||
│ │ │ │
|
||
│ │ ⚠️ URGENT — Neurosurgery referral recommended │ │
|
||
│ │ │ │
|
||
│ │ Diagnoses: │ │
|
||
│ │ ████████████████████░░ 87% NPH │ │
|
||
│ │ ██████░░░░░░░░░░░░░░░ 34% Cerebral atrophy │ │
|
||
│ │ █████░░░░░░░░░░░░░░░░ 28% WM changes │ │
|
||
│ │ │ │
|
||
│ │ Referral: Neurosurgery (within 1 week) │ │
|
||
│ │ "NPH with high probability — VP shunt evaluation" │ │
|
||
│ │ │ │
|
||
│ │ Series analyzed: T2 FLAIR, T2 (2 of 12) │ │
|
||
│ │ Analysis time: 47s · Cost: $0.03 │ │
|
||
│ │ │ │
|
||
│ │ ⚕️ AI-assisted analysis — not a radiologist report │ │
|
||
│ └─────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ Results panel integrated into existing DICOM viewer │
|
||
│ Clicking a diagnosis highlights the relevant series │
|
||
│ Results persisted in patient dossier │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Automatic vs Manual Analysis
|
||
|
||
**Option A (default): Manual trigger** — User clicks "Analyze with AI" button. Best for initial launch, gives user control.
|
||
|
||
**Option B (future): Auto-trigger** — Analysis starts automatically on upload for brain MRI studies. Configurable in settings.
|
||
|
||
### Viewer Integration
|
||
|
||
The AI results panel is a new sidebar component in the existing DICOM viewer:
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ inou Viewer [≡] │
|
||
├────────────────────────────────────┬─────────────────────────┤
|
||
│ │ 📋 Study Info │
|
||
│ │ Patient: [redacted] │
|
||
│ │ Date: 2026-02-14 │
|
||
│ DICOM Image Display │ Series: 12 │
|
||
│ │ │
|
||
│ (existing viewer canvas) ├─────────────────────────┤
|
||
│ │ 🤖 AI Analysis │
|
||
│ │ │
|
||
│ │ Status: ✅ Complete │
|
||
│ │ [Results panel above] │
|
||
│ │ │
|
||
│ │ [Re-analyze ▼] │
|
||
│ │ [Export Report] │
|
||
├────────────────────────────────────┴─────────────────────────┤
|
||
│ Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450 │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Error Handling
|
||
|
||
### Error Categories & Recovery
|
||
|
||
| Error | Detection | Recovery | User Impact |
|
||
|-------|-----------|----------|-------------|
|
||
| **RunPod cold start timeout** | No response in 60s | Retry once; if fails, queue for retry in 5 min | "Analysis queued — GPU starting up" |
|
||
| **Worker execution timeout** | RunPod 300s timeout | Job marked failed; auto-retry with reduced series count | "Analysis timed out — retrying with fewer series" |
|
||
| **Model inference error** | Exception in handler | Return error in callback; log stack trace | "Analysis failed — try again or contact support" |
|
||
| **DICOM download failure** | HTTP error / timeout | Retry download 3× with exponential backoff | Transparent to user |
|
||
| **Callback delivery failure** | HTTP error | Worker retries 3×; inou polls RunPod status API as fallback | Transparent — results arrive via polling |
|
||
| **Invalid DICOM study** | Parse errors in ingest | Reject at upload time with specific error | "Unable to parse series — unsupported format" |
|
||
| **No relevant series found** | Series selector returns empty | Skip analysis; inform user | "No brain MRI sequences detected in this study" |
|
||
| **RunPod quota / billing** | 402/429 errors | Alert admin; queue jobs for later | "Analysis temporarily unavailable" |
|
||
| **Network volume unavailable** | Model load failure | Worker retries; if persistent, alert | "Analysis service maintenance" |
|
||
|
||
### Retry Strategy
|
||
|
||
```go
|
||
type RetryPolicy struct {
|
||
MaxAttempts int // 3
|
||
InitialBackoff time.Duration // 10s
|
||
MaxBackoff time.Duration // 5m
|
||
BackoffFactor float64 // 2.0
|
||
RetryableErrors []string // timeout, 5xx, connection_refused
|
||
}
|
||
```
|
||
|
||
### Monitoring & Alerting
|
||
|
||
| Metric | Alert Threshold | Channel |
|
||
|--------|----------------|---------|
|
||
| Job failure rate | >10% in 1 hour | Slack / Signal |
|
||
| Median latency | >180s | Dashboard |
|
||
| RunPod spend | >$50/day | Email |
|
||
| Cold start frequency | >50% of jobs | Dashboard (indicates insufficient idle timeout) |
|
||
| Callback failures | Any | Immediate alert |
|
||
|
||
---
|
||
|
||
## 10. Implementation Plan
|
||
|
||
### Phase 1: Foundation (2 weeks)
|
||
|
||
- [ ] Fork Prima repo, create `inou-prima-worker` repo
|
||
- [ ] Write RunPod handler (`handler.py`)
|
||
- [ ] Build Docker image, test locally with `nvidia-docker`
|
||
- [ ] Set up RunPod account, network volume, deploy worker
|
||
- [ ] Test end-to-end with a sample DICOM study
|
||
- [ ] Verify model outputs match Prima's reference results
|
||
|
||
### Phase 2: Series Selection (1 week)
|
||
|
||
- [ ] Implement rule-based series classifier in Go
|
||
- [ ] Build clinical protocol selection logic
|
||
- [ ] Add LLM fallback (Claude Haiku integration)
|
||
- [ ] Test with diverse DICOM studies (Siemens, GE, Philips naming conventions)
|
||
- [ ] Validate selection accuracy: should match radiologist series choice >90%
|
||
|
||
### Phase 3: Backend Integration (2 weeks)
|
||
|
||
- [ ] Add analysis endpoints to inou backend
|
||
- [ ] Implement job queue with retry logic
|
||
- [ ] Build DICOM download endpoint with signed URLs
|
||
- [ ] Implement callback handler with HMAC verification
|
||
- [ ] Add PHI stripping to DICOM export pipeline
|
||
- [ ] Store results in dossier (encrypted)
|
||
- [ ] Write integration tests
|
||
|
||
### Phase 4: Frontend Integration (1 week)
|
||
|
||
- [ ] Add "Analyze with AI" button to viewer
|
||
- [ ] Build AI results panel component
|
||
- [ ] Implement progress indicator (WebSocket or polling)
|
||
- [ ] Add results to dossier view
|
||
- [ ] Export report functionality
|
||
|
||
### Phase 5: Hardening (1 week)
|
||
|
||
- [ ] Load testing (concurrent jobs, cold starts)
|
||
- [ ] Error handling for all failure modes
|
||
- [ ] Monitoring dashboard
|
||
- [ ] Cost tracking and alerting
|
||
- [ ] Security review (PHI flow, access controls)
|
||
- [ ] Documentation
|
||
|
||
**Total: ~7 weeks** to production-ready MVP.
|
||
|
||
---
|
||
|
||
## 11. Future Work
|
||
|
||
### Near-Term (3–6 months)
|
||
|
||
- **Auto-trigger analysis** on brain MRI upload (configurable)
|
||
- **Batch processing** — analyze multiple studies in parallel
|
||
- **Result caching** — skip re-analysis if study hasn't changed
|
||
- **Comparison mode** — compare current vs prior study results
|
||
|
||
### Medium-Term (6–12 months)
|
||
|
||
- **Fine-tuning on specific conditions** — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case)
|
||
- **Radiologist feedback loop** — Allow radiologists to confirm/reject AI findings, feeding back into training data
|
||
- **Multi-series reasoning** — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1)
|
||
|
||
### Long-Term (12+ months)
|
||
|
||
- **Multi-organ expansion** — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture
|
||
- **On-premise deployment** — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU)
|
||
- **Real-time inference** — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration)
|
||
- **Clinical decision support** — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations
|
||
- **FDA 510(k) pathway** — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device
|
||
|
||
### Architecture Extensibility
|
||
|
||
The serverless worker pattern is model-agnostic. To add a new model:
|
||
|
||
1. Build a new Docker image with the model
|
||
2. Deploy as a separate RunPod serverless endpoint
|
||
3. Add a new series selector profile
|
||
4. Route from inou backend based on study type / body part
|
||
|
||
```
|
||
inou Backend
|
||
│
|
||
┌──────────┼──────────┐
|
||
▼ ▼ ▼
|
||
Prima SpineVLM CardiacAI
|
||
(Brain) (Spine) (Heart)
|
||
L40S L40S A100
|
||
```
|
||
|
||
Same queue, same callback pattern, same dossier storage. Only the worker and selector change.
|
||
|
||
---
|
||
|
||
## Appendix A: Prima Model Details
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| **Paper** | "Learning neuroimaging models from health system-scale data" (arXiv:2509.18638) |
|
||
| **Training data** | UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images |
|
||
| **Architecture** | Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer |
|
||
| **Diagnoses covered** | 52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental |
|
||
| **Mean AUC** | 90.1 ± 5.0% (prospective 30K study validation) |
|
||
| **License** | MIT |
|
||
| **GPU requirement** | Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100) |
|
||
| **Weights** | `primafullmodel107.pt` (~3.5GB) + `vqvae_model_step16799.pth` (~500MB) |
|
||
| **Dependencies** | PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49 |
|
||
|
||
## Appendix B: DICOM Tags Reference
|
||
|
||
```
|
||
(0008,0008) ImageType — ORIGINAL\PRIMARY vs DERIVED\SECONDARY
|
||
(0008,0060) Modality — MR
|
||
(0008,103E) SeriesDescription — Free text, scanner-dependent
|
||
(0010,0010) PatientName — PHI — STRIP before sending to worker
|
||
(0010,0020) PatientID — PHI — STRIP
|
||
(0010,0030) PatientBirthDate — PHI — STRIP
|
||
(0018,0010) ContrastBolusAgent — Present = contrast-enhanced
|
||
(0018,0020) ScanningSequence — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar)
|
||
(0018,0023) MRAcquisitionType — 2D, 3D
|
||
(0018,0024) SequenceName — Scanner-specific pulse sequence name
|
||
(0018,0050) SliceThickness — mm
|
||
(0020,0011) SeriesNumber — Integer ordering
|
||
(0020,0013) InstanceNumber — Slice index within series
|
||
(0028,0010) Rows — Pixel rows
|
||
(0028,0011) Columns — Pixel columns
|
||
```
|
||
|
||
---
|
||
|
||
*This specification is a living document. Update as implementation progresses and requirements evolve.*
|