43 KiB

Raw Blame History

Prima Integration — Technical Specification

inou Health × Prima Brain MRI VLM Version: 1.0 · Date: 2026-02-14 · Author: James (AI Architect) · Reviewer: Johan Jongsma

Executive Summary

This spec defines how inou integrates Prima, the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations.

The design prioritizes cost efficiency (serverless GPU, intelligent series selection), security (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and seamless UX (upload → automatic analysis → results in viewer).

Key numbers:

Per-study cost with intelligent selection: $0.04–$0.13 (vs $0.22–$0.66 naive)
Cold start to results: 60–120 seconds
Zero always-on GPU cost

System Architecture
Intelligent Series Selection
RunPod Serverless Worker
API Design
Docker Image
Data Flow & Security
Cost Analysis
User Experience
Error Handling
Implementation Plan
Future Work

1. System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         inou Frontend                               │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────────────────────┐ │
│  │  DICOM   │  │   DICOM      │  │   AI Results Panel            │ │
│  │  Upload  │──│   Viewer     │  │  • Diagnosis probabilities    │ │
│  │          │  │              │  │  • Urgency / priority         │ │
│  └────┬─────┘  └──────────────┘  │  • Specialist referral        │ │
│       │                          │  • Series-level annotations   │ │
│       │                          └──────────┬────────────────────┘ │
└───────┼─────────────────────────────────────┼─────────────────────┘
        │ POST /api/studies                   │ GET /api/studies/:id/analysis
        ▼                                     ▲
┌───────────────────────────────────────────────────────────────────┐
│                      inou Backend (Go)                            │
│                                                                   │
│  ┌──────────┐  ┌─────────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  DICOM   │  │   Series    │  │  Job     │  │   Dossier    │  │
│  │  Ingest  │─▶│  Selector   │─▶│  Queue   │  │   Store      │  │
│  │  + Parse │  │  (LLM/Rules)│  │          │  │  (SQLite+AES)│  │
│  └──────────┘  └─────────────┘  └────┬─────┘  └──────▲───────┘  │
│                                      │                │          │
└──────────────────────────────────────┼────────────────┼──────────┘
                                       │                │
                              Job dispatch        Results callback
                              (HTTPS POST)        (HTTPS POST + HMAC)
                                       │                │
                                       ▼                │
                    ┌──────────────────────────────────────────┐
                    │          RunPod Serverless                │
                    │                                          │
                    │  ┌────────────────────────────────────┐  │
                    │  │       Prima Worker (L40S 48GB)     │  │
                    │  │                                    │  │
                    │  │  1. Download DICOM from signed URL │  │
                    │  │  2. Load VQ-VAE tokenizer          │  │
                    │  │  3. Tokenize selected series       │  │
                    │  │  4. Free VQ-VAE from VRAM          │  │
                    │  │  5. Load Prima VLM                 │  │
                    │  │  6. Run inference (fp16)           │  │
                    │  │  7. POST results back to inou      │  │
                    │  │  8. Purge all DICOM data           │  │
                    │  │                                    │  │
                    │  └────────────────────────────────────┘  │
                    │                                          │
                    │  GPU: L40S 48GB · Scales 0→N · Pay/sec  │
                    └──────────────────────────────────────────┘

Component Responsibilities

Component	Technology	Responsibility
DICOM Ingest	Go	Parse uploaded DICOM, extract metadata, store encrypted
Series Selector	Go + LLM (Claude Haiku)	Analyze DICOM metadata, select diagnostically relevant series
Job Queue	Go (in-process)	Manage analysis jobs, retry logic, status tracking
RunPod Worker	Python 3.11, PyTorch 2.6	Run Prima inference on selected series
Dossier Store	SQLite + AES-256-GCM	Store results encrypted at rest
AI Results Panel	Frontend (existing viewer)	Display diagnosis, urgency, referral

2. Intelligent Series Selection

The Problem

A typical brain MRI study contains 8–15 series with 100–500 slices each, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 2–4 specific sequence types.

Selection Architecture

DICOM Study Metadata
        │
        ▼
┌───────────────────────────────────────────┐
│         Series Selection Pipeline          │
│                                            │
│  Step 1: Extract metadata per series       │
│    • SeriesDescription (0008,103E)         │
│    • SequenceName (0018,0024)              │
│    • MRAcquisitionType (0018,0023)         │
│    • ScanningSequence (0018,0020)          │
│    • ContrastBolusAgent (0018,0010)        │
│    • SliceThickness (0018,0050)            │
│    • NumberOfSlices                         │
│    • ImageType (0008,0008)                 │
│    • Modality (0008,0060)                  │
│                                            │
│  Step 2: Rule-based classification         │
│    Map each series → sequence type:        │
│    T1, T1+C, T2, T2-FLAIR, DWI, ADC,     │
│    SWI, MRA, SCOUT, LOCALIZER, DERIVED    │
│                                            │
│  Step 3: Clinical protocol matching        │
│    Apply selection rules (see table)       │
│                                            │
│  Step 4: LLM fallback for ambiguous cases  │
│    If rule-based classification fails,     │
│    send metadata to Claude Haiku           │
│                                            │
│  Output: ordered list of series to analyze │
└───────────────────────────────────────────┘

Rule-Based Classification

Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach:

type SeriesClassification struct {
    SeriesUID     string
    SequenceType  string   // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER
    HasContrast   bool
    IsLocalizer   bool
    IsDerived     bool
    SliceCount    int
    Confidence    float64  // 0.0-1.0, below 0.7 triggers LLM fallback
}

Classification rules (ordered by priority):

Signal	Tag	Example Values	Maps To
ImageType contains "LOCALIZER"	(0008,0008)	`ORIGINAL\PRIMARY\LOCALIZER`	SKIP
ImageType contains "DERIVED"	(0008,0008)	`DERIVED\SECONDARY`	SKIP (usually)
SliceCount < 10	computed	—	SKIP (scout/cal)
ContrastBolusAgent present	(0018,0010)	`Gadavist`	+contrast flag
Description contains "flair"	(0008,103E)	`AX T2 FLAIR`, `FLAIR_DARK-FLUID`	T2-FLAIR
Description contains "t2" (no flair)	(0008,103E)	`AX T2`, `T2_TSE_TRA`	T2
Description contains "t1"	(0008,103E)	`SAG T1`, `T1_MPRAGE`	T1 or T1+C
Description contains "dwi"/"diffusion"	(0008,103E)	`DWI_b1000`, `EP2D_DIFF`	DWI
Description contains "adc"	(0008,103E)	`ADC_MAP`	ADC
Description contains "swi"/"suscept"	(0008,103E)	`SWI_mIP`, `SUSCEPTIBILITY`	SWI
ScanningSequence = "EP" + b-value tag	(0018,0020)	—	DWI
No match, confidence < 0.7	—	—	→ LLM fallback

Clinical Protocol Selection

Given the classified series, select based on the clinical question (or use a general protocol if none specified):

Clinical Question	Required Series	Optional Series	Expected Count
General screening	T1, T2, T2-FLAIR	DWI, T1+C	3–5
Hydrocephalus	T2-FLAIR, T2	T1, DWI	2–3
Tumor / mass	T1+C, T2-FLAIR, T1 (pre-contrast)	DWI, SWI	3–4
Stroke / acute	DWI, ADC, T2-FLAIR	MRA, SWI	3–4
MS / demyelination	T2-FLAIR, T1+C, T2	DWI	3
Infection	T1+C, DWI, T2-FLAIR	T2	3

Default behavior (no clinical question): Use "General screening" — select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum.

LLM Fallback

When rule-based classification confidence is below 0.7 for any series:

Prompt to Claude Haiku:

You are a neuroradiology MRI series classifier. Given the following DICOM 
metadata for a single MRI series, classify it.

SeriesDescription: {desc}
SequenceName: {seq_name}
ScanningSequence: {scan_seq}
MRAcquisitionType: {acq_type}
ImageType: {img_type}
ContrastBolusAgent: {contrast}
SliceCount: {count}
Manufacturer: {mfr}
ManufacturerModelName: {model}

Respond with JSON:
{
  "sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER",
  "has_contrast": true|false,
  "is_diagnostically_relevant": true|false,
  "confidence": 0.0-1.0,
  "reasoning": "brief explanation"
}

Cost of LLM fallback: ~$0.001 per series (Haiku). Triggered for maybe 1–2 series per study. Negligible.

3. RunPod Serverless Worker

Worker Architecture

The RunPod serverless worker wraps Prima's pipeline.py with an HTTP handler:

# handler.py — RunPod serverless handler
import runpod
import torch
import tempfile
import requests
import shutil
import json
import time
import gc
from pathlib import Path

from pipeline import Pipeline  # Prima's pipeline

def handler(job):
    """
    Input schema:
    {
        "input": {
            "dicom_url": "https://inou.../signed-download/...",
            "series_uids": ["1.2.840...", "1.2.840..."],
            "callback_url": "https://inou.../api/analysis/callback",
            "callback_hmac_key": "hex-encoded-key",
            "job_id": "uuid",
            "study_id": "uuid"
        }
    }
    """
    input_data = job["input"]
    work_dir = Path(tempfile.mkdtemp())
    
    try:
        # 1. Download DICOM (only selected series)
        t0 = time.time()
        dicom_path = download_dicom(input_data["dicom_url"], work_dir)
        
        # 2. Filter to selected series only
        filter_series(dicom_path, input_data["series_uids"])
        
        # 3. Run Prima pipeline
        config = build_config(dicom_path, work_dir / "output")
        pipeline = Pipeline(config)
        
        # Load study
        pipeline.load_mri_study()
        
        # Tokenize (VQ-VAE) — loads tokenizer, runs, then frees
        pipeline.load_tokenizer_model()
        tokens = pipeline.tokenize_series()
        pipeline.free_tokenizer()  # Free VRAM
        
        # Run Prima VLM
        pipeline.load_prima_model()
        results = pipeline.run_inference(tokens)
        pipeline.free_prima_model()
        
        elapsed = time.time() - t0
        
        # 4. Build response
        response = {
            "job_id": input_data["job_id"],
            "study_id": input_data["study_id"],
            "elapsed_seconds": elapsed,
            "series_analyzed": len(input_data["series_uids"]),
            "results": {
                "diagnoses": results["diagnoses"],       # list of {name, probability}
                "priority": results["priority"],          # STAT / URGENT / ROUTINE
                "referral": results["referral"],           # specialist recommendation
                "differential": results["differential"],   # ranked differential diagnosis
            }
        }
        
        # 5. POST results back to inou
        post_callback(input_data["callback_url"], response, 
                      input_data["callback_hmac_key"])
        
        return response
        
    finally:
        # 6. ALWAYS purge DICOM data — no PHI left on worker
        shutil.rmtree(work_dir, ignore_errors=True)
        torch.cuda.empty_cache()
        gc.collect()

runpod.serverless.start({"handler": handler})

Worker Lifecycle

                    RunPod Serverless
                    ─────────────────
Idle (no GPU)  ──────────────────────── $0.00/sec
                         │
                    Job arrives
                         │
                         ▼
Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S)
  • Container starts                     starts billing
  • Model weights loaded from
    network volume (NFS, ~10s)
  • GPU warm
                         │
                         ▼
Inference (~30-60s) ─────────────────── $0.00073/sec
  • VQ-VAE tokenization (~10-20s)
  • Free VQ-VAE
  • Prima VLM inference (~20-40s)
  • Results callback
                         │
                         ▼
Idle timeout (5s) ───────────────────── $0.00073/sec (configurable)
                         │
                         ▼
Scale to 0 ──────────────────────────── $0.00/sec

RunPod Configuration

{
    "name": "inou-prima-worker",
    "gpu": "NVIDIA L40S",
    "gpuCount": 1,
    "volumeId": "vol_prima_weights",
    "volumeMountPath": "/models",
    "dockerImage": "inou/prima-worker:latest",
    "env": {
        "MODEL_DIR": "/models",
        "PRIMA_CKPT": "/models/primafullmodel107.pt",
        "VQVAE_CKPT": "/models/vqvae_model_step16799.pth"
    },
    "scalerType": "QUEUE_DELAY",
    "scalerValue": 4,
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeout": 5,
    "executionTimeout": 300,
    "flashboot": true
}

Network Volume: Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS — no download delay after first pull.

4. API Design

inou Backend Endpoints

4.1 Trigger Analysis

POST /api/studies/{studyID}/analyze
Authorization: Bearer <token>

Request Body (optional):
{
    "clinical_question": "evaluate for hydrocephalus",  // optional
    "priority": "routine",                               // routine | urgent | stat
    "force_series": ["1.2.840..."]                       // optional: override selection
}

Response 202:
{
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "selected_series": [
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2 FLAIR",
            "sequence_type": "T2-FLAIR",
            "slice_count": 28,
            "selection_reason": "Primary sequence for hydrocephalus evaluation"
        },
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2",
            "sequence_type": "T2",
            "slice_count": 24,
            "selection_reason": "Complementary T2-weighted for ventricular assessment"
        }
    ],
    "estimated_cost_usd": 0.066,
    "estimated_duration_seconds": 90
}

4.2 Analysis Status

GET /api/studies/{studyID}/analysis
Authorization: Bearer <token>

Response 200:
{
    "job_id": "550e8400-...",
    "status": "completed",         // queued | processing | completed | failed
    "created_at": "2026-02-14T20:15:00Z",
    "completed_at": "2026-02-14T20:16:32Z",
    "duration_seconds": 92,
    "cost_usd": 0.067,
    "series_analyzed": 2,
    "results": { ... }             // see 4.3
}

4.3 Analysis Results

GET /api/studies/{studyID}/analysis/results
Authorization: Bearer <token>

Response 200:
{
    "study_id": "...",
    "model": "prima-v1.07",
    "model_version": "primafullmodel107",
    "analysis_timestamp": "2026-02-14T20:16:32Z",
    
    "diagnoses": [
        {
            "condition": "Normal pressure hydrocephalus",
            "icd10": "G91.2",
            "probability": 0.87,
            "category": "developmental"
        },
        {
            "condition": "Cerebral atrophy",
            "icd10": "G31.9",
            "probability": 0.34,
            "category": "degenerative"
        },
        {
            "condition": "Periventricular white matter changes",
            "icd10": "I67.3",
            "probability": 0.28,
            "category": "vascular"
        }
    ],
    
    "priority": {
        "level": "URGENT",
        "reasoning": "High probability hydrocephalus requiring neurosurgical evaluation"
    },
    
    "referral": {
        "specialty": "Neurosurgery",
        "urgency": "within_1_week",
        "reasoning": "NPH with high probability — consider VP shunt evaluation"
    },
    
    "differential": [
        "Normal pressure hydrocephalus",
        "Communicating hydrocephalus",
        "Cerebral atrophy (ex vacuo ventriculomegaly)"
    ],
    
    "series_results": [
        {
            "series_uid": "1.2.840...",
            "description": "AX T2 FLAIR",
            "findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow."
        }
    ],
    
    "disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation."
}

4.4 Callback Endpoint (Worker → inou)

POST /api/analysis/callback
X-HMAC-Signature: sha256=<hex>
Content-Type: application/json

{
    "job_id": "...",
    "study_id": "...",
    "status": "completed",
    "elapsed_seconds": 47.3,
    "series_analyzed": 2,
    "results": { ... }
}

HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks.

4.5 DICOM Download (Worker → inou)

GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2}
→ 200 application/zip (streaming)

Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs.
Only selected series are included in the download — minimizes data transfer.

5. Docker Image

Dockerfile

FROM nvidia/cuda:12.4.0-devel-ubuntu22.04

# System deps
RUN apt-get update && apt-get install -y \
    python3.11 python3.11-dev python3-pip \
    libgl1-mesa-glx libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3.11 /usr/bin/python

# Python deps (cached layer)
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt

# flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89)
RUN pip install flash-attn==2.7.4.post1 --no-build-isolation

# RunPod SDK
RUN pip install runpod==1.7.0

# Prima source code
COPY Prima/ /app/Prima/
COPY handler.py /app/handler.py

WORKDIR /app

# Model weights are on network volume, not baked into image
# /models/primafullmodel107.pt (~3.5GB)
# /models/vqvae_model_step16799.pth (~500MB)

CMD ["python", "handler.py"]

Image size: ~12GB (CUDA base + PyTorch + flash-attn + Prima)

What's in the image vs network volume:

Component	Location	Size	Rationale
CUDA 12.4 + Ubuntu	Docker image	~4GB	Cached, rarely changes
PyTorch 2.6 + deps	Docker image	~6GB	Cached, rarely changes
flash-attn	Docker image	~500MB	Must match CUDA version
Prima source	Docker image	~50MB	Changes with updates
RunPod handler	Docker image	~5KB	Our wrapper code
`primafullmodel107.pt`	Network volume	~3.5GB	Too large for image, shared
`vqvae_model_step16799.pth`	Network volume	~500MB	Shared across workers

Build & Deploy

# Build
docker build -t inou/prima-worker:latest .

# Push to RunPod's registry (or Docker Hub)
docker push inou/prima-worker:latest

# Network volume setup (one-time)
runpod volume create --name prima-weights --size 10
# Upload model weights to volume via RunPod console or SSH

6. Data Flow & Security

Data Flow Diagram

   User uploads       inou stores          Worker downloads     Worker processes
   DICOM study        encrypted            selected series      and returns results
                      at rest              (signed URL)
        │                 │                      │                     │
        ▼                 ▼                      ▼                     ▼
   ┌─────────┐     ┌───────────┐          ┌──────────┐         ┌──────────┐
   │ Browser │────▶│ inou      │─────────▶│ RunPod   │────────▶│ inou     │
   │ Upload  │     │ Backend   │  signed  │ Worker   │ HMAC    │ Backend  │
   │ (TLS)   │     │ (AES-GCM) │  URL     │ (tmpfs)  │ callback│ (store)  │
   └─────────┘     └───────────┘          └──────────┘         └──────────┘
                                                │
                                          On completion:
                                          shutil.rmtree()
                                          No PHI retained

Security Controls

Concern	Control
DICOM at rest (inou)	AES-256-GCM encryption in SQLite (existing)
DICOM in transit to worker	TLS 1.3 + time-limited signed URL (5 min TTL)
DICOM on worker	Stored in tmpfs; `shutil.rmtree()` in `finally` block; container destroyed after job
Results in transit	TLS 1.3 + HMAC-SHA256 callback verification
Results at rest	Encrypted in dossier (existing AES-256-GCM)
PHI in logs	No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata
RunPod access	API key stored as inou backend env var, never exposed to frontend
Worker isolation	Each job runs in isolated container; no shared filesystem between jobs

HIPAA Considerations

HIPAA Requirement	Implementation
Access controls	inou auth (existing); RunPod API key; signed URLs
Encryption	AES-256-GCM at rest; TLS 1.3 in transit
Audit trail	All analysis requests logged with timestamp, user, study ID
Minimum necessary	Only selected series transmitted; no patient demographics sent to worker
BAA	RunPod offers BAA for serverless — must execute before production
Data retention	Zero retention on worker; configurable retention on inou
De-identification	DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) — Series UID sufficient for processing

PHI Minimization Pipeline

Before sending DICOM to RunPod, inou strips the following tags:

var phiTagsToStrip = []dicom.Tag{
    dicom.PatientName,          // (0010,0010)
    dicom.PatientID,            // (0010,0020)  
    dicom.PatientBirthDate,     // (0010,0030)
    dicom.PatientSex,           // (0010,0040) — keep if Prima needs it
    dicom.PatientAddress,       // (0010,1040)
    dicom.ReferringPhysician,   // (0008,0090)
    dicom.InstitutionName,      // (0008,0080)
    dicom.InstitutionAddress,   // (0008,0081)
    dicom.AccessionNumber,      // (0008,0050)
}
// Pixel data and series-level imaging tags are preserved — Prima needs those

7. Cost Analysis

RunPod Pricing (L40S Serverless)

Metric	Value
Per-second billing	$0.00073/sec
Per-minute	$0.0438/min
Per-hour	$2.628/hr
Minimum charge per job	None (per-second)
Network volume (10GB)	~$1.00/month
Idle workers	$0.00 (scale to 0)

Per-Series Timing Breakdown

Phase	Duration	Notes
Cold start (first job)	15–25s	Container + model load from volume
Warm start (subsequent)	0–2s	Worker already running
DICOM download	2–5s	Depends on series size (~50-200MB)
VQ-VAE tokenization	10–20s	Per series, depends on slice count
VQ-VAE → Prima model swap	2–3s	Free VQ-VAE, load Prima
Prima inference	15–30s	Depends on token count
Results callback	<1s	Small JSON payload
Total per series	30–60s	Warm: 30–40s typical

Cost Per Study: With vs Without Intelligent Selection

Scenario A: Routine Brain MRI (12 series, ~3000 slices)

Approach	Series Processed	Est. Duration	GPU Cost	LLM Selection Cost	Total
Naive (all series)	12	360–720s	$0.26–$0.53	$0.00	$0.26–$0.53
Intelligent (selected)	3	90–180s	$0.066–$0.13	~$0.003	$0.07–$0.13
Savings	—	—	—	—	73–75%

Scenario B: Brain MRI with Contrast (15 series, ~5000 slices)

Approach	Series Processed	Est. Duration	GPU Cost	LLM Selection Cost	Total
Naive	15	450–900s	$0.33–$0.66	$0.00	$0.33–$0.66
Intelligent	4	120–240s	$0.088–$0.18	~$0.004	$0.09–$0.18
Savings	—	—	—	—	73%

Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol)

Approach	Series Processed	Est. Duration	GPU Cost	Total
Naive	6	180–360s	$0.13–$0.26	$0.13–$0.26
Intelligent	3	90–180s	$0.066–$0.13	$0.07–$0.13

Monthly Volume Projections

Volume	Naive Cost	Intelligent Cost	Monthly Savings
10 studies/month	$3.30–$6.60	$0.90–$1.80	$2.40–$4.80
50 studies/month	$16.50–$33.00	$4.50–$9.00	$12.00–$24.00
200 studies/month	$66.00–$132.00	$18.00–$36.00	$48.00–$96.00
1000 studies/month	$330–$660	$90–$180	$240–$480

Fixed costs: RunPod network volume ~$1/month. No other infrastructure costs — workers scale to zero.

Break-Even vs Always-On

An always-on L40S instance costs ~$0.69/hr = $500/month.

Break-even point: $500 ÷ $0.13/study = ~3,850 studies/month before always-on becomes cheaper.

For inou's expected volumes (10–200/month), serverless is 50–500× cheaper.

8. User Experience

Upload → Results Flow

┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD                                                    │
│                                                              │
│  User uploads DICOM study (drag & drop or folder select)    │
│  inou parses metadata, displays series list in viewer       │
│  "AI Analysis Available" badge appears on brain MRI studies │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 2. ANALYSIS INITIATED                                        │
│                                                              │
│  [Analyze with AI] button in viewer toolbar                 │
│  Optional: clinical question text field                     │
│  "Analyzing 3 of 12 series... Est. ~90 seconds"            │
│  Progress indicator with phases:                            │
│    ◉ Series selected (3 of 12)                              │
│    ◉ Uploading to analysis engine...                        │
│    ○ AI processing...                                        │
│    ○ Results ready                                           │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 3. RESULTS DISPLAYED                                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ AI Analysis Results              Prima v1.07        │    │
│  │                                                     │    │
│  │ ⚠️  URGENT — Neurosurgery referral recommended      │    │
│  │                                                     │    │
│  │ Diagnoses:                                          │    │
│  │   ████████████████████░░  87%  NPH                  │    │
│  │   ██████░░░░░░░░░░░░░░░  34%  Cerebral atrophy     │    │
│  │   █████░░░░░░░░░░░░░░░░  28%  WM changes           │    │
│  │                                                     │    │
│  │ Referral: Neurosurgery (within 1 week)              │    │
│  │ "NPH with high probability — VP shunt evaluation"  │    │
│  │                                                     │    │
│  │ Series analyzed: T2 FLAIR, T2 (2 of 12)            │    │
│  │ Analysis time: 47s · Cost: $0.03                    │    │
│  │                                                     │    │
│  │ ⚕️ AI-assisted analysis — not a radiologist report  │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  Results panel integrated into existing DICOM viewer        │
│  Clicking a diagnosis highlights the relevant series        │
│  Results persisted in patient dossier                        │
└─────────────────────────────────────────────────────────────┘

Automatic vs Manual Analysis

Option A (default): Manual trigger — User clicks "Analyze with AI" button. Best for initial launch, gives user control.

Option B (future): Auto-trigger — Analysis starts automatically on upload for brain MRI studies. Configurable in settings.

Viewer Integration

The AI results panel is a new sidebar component in the existing DICOM viewer:

┌──────────────────────────────────────────────────────────────┐
│  inou Viewer                                          [≡]    │
├────────────────────────────────────┬─────────────────────────┤
│                                    │  📋 Study Info          │
│                                    │  Patient: [redacted]    │
│                                    │  Date: 2026-02-14       │
│        DICOM Image Display         │  Series: 12             │
│                                    │                         │
│      (existing viewer canvas)      ├─────────────────────────┤
│                                    │  🤖 AI Analysis         │
│                                    │                         │
│                                    │  Status: ✅ Complete     │
│                                    │  [Results panel above]  │
│                                    │                         │
│                                    │  [Re-analyze ▼]         │
│                                    │  [Export Report]        │
├────────────────────────────────────┴─────────────────────────┤
│  Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450           │
└──────────────────────────────────────────────────────────────┘

9. Error Handling

Error Categories & Recovery

Error	Detection	Recovery	User Impact
RunPod cold start timeout	No response in 60s	Retry once; if fails, queue for retry in 5 min	"Analysis queued — GPU starting up"
Worker execution timeout	RunPod 300s timeout	Job marked failed; auto-retry with reduced series count	"Analysis timed out — retrying with fewer series"
Model inference error	Exception in handler	Return error in callback; log stack trace	"Analysis failed — try again or contact support"
DICOM download failure	HTTP error / timeout	Retry download 3× with exponential backoff	Transparent to user
Callback delivery failure	HTTP error	Worker retries 3×; inou polls RunPod status API as fallback	Transparent — results arrive via polling
Invalid DICOM study	Parse errors in ingest	Reject at upload time with specific error	"Unable to parse series — unsupported format"
No relevant series found	Series selector returns empty	Skip analysis; inform user	"No brain MRI sequences detected in this study"
RunPod quota / billing	402/429 errors	Alert admin; queue jobs for later	"Analysis temporarily unavailable"
Network volume unavailable	Model load failure	Worker retries; if persistent, alert	"Analysis service maintenance"

Retry Strategy

type RetryPolicy struct {
    MaxAttempts     int           // 3
    InitialBackoff  time.Duration // 10s
    MaxBackoff      time.Duration // 5m
    BackoffFactor   float64       // 2.0
    RetryableErrors []string      // timeout, 5xx, connection_refused
}

Monitoring & Alerting

Metric	Alert Threshold	Channel
Job failure rate	>10% in 1 hour	Slack / Signal
Median latency	>180s	Dashboard
RunPod spend	>$50/day	Email
Cold start frequency	>50% of jobs	Dashboard (indicates insufficient idle timeout)
Callback failures	Any	Immediate alert

10. Implementation Plan

Phase 1: Foundation (2 weeks)

Fork Prima repo, create inou-prima-worker repo
Write RunPod handler (handler.py)
Build Docker image, test locally with nvidia-docker
Set up RunPod account, network volume, deploy worker
Test end-to-end with a sample DICOM study
Verify model outputs match Prima's reference results

Phase 2: Series Selection (1 week)

Implement rule-based series classifier in Go
Build clinical protocol selection logic
Add LLM fallback (Claude Haiku integration)
Test with diverse DICOM studies (Siemens, GE, Philips naming conventions)
Validate selection accuracy: should match radiologist series choice >90%

Phase 3: Backend Integration (2 weeks)

Add analysis endpoints to inou backend
Implement job queue with retry logic
Build DICOM download endpoint with signed URLs
Implement callback handler with HMAC verification
Add PHI stripping to DICOM export pipeline
Store results in dossier (encrypted)
Write integration tests

Phase 4: Frontend Integration (1 week)

Add "Analyze with AI" button to viewer
Build AI results panel component
Implement progress indicator (WebSocket or polling)
Add results to dossier view
Export report functionality

Phase 5: Hardening (1 week)

Load testing (concurrent jobs, cold starts)
Error handling for all failure modes
Monitoring dashboard
Cost tracking and alerting
Security review (PHI flow, access controls)
Documentation

Total: ~7 weeks to production-ready MVP.

11. Future Work

Near-Term (3–6 months)

Auto-trigger analysis on brain MRI upload (configurable)
Batch processing — analyze multiple studies in parallel
Result caching — skip re-analysis if study hasn't changed
Comparison mode — compare current vs prior study results

Medium-Term (6–12 months)

Fine-tuning on specific conditions — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case)
Radiologist feedback loop — Allow radiologists to confirm/reject AI findings, feeding back into training data
Multi-series reasoning — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1)

Long-Term (12+ months)

Multi-organ expansion — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture
On-premise deployment — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU)
Real-time inference — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration)
Clinical decision support — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations
FDA 510(k) pathway — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device

Architecture Extensibility

The serverless worker pattern is model-agnostic. To add a new model:

Build a new Docker image with the model
Deploy as a separate RunPod serverless endpoint
Add a new series selector profile
Route from inou backend based on study type / body part

                    inou Backend
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
         Prima       SpineVLM    CardiacAI
        (Brain)      (Spine)     (Heart)
         L40S         L40S        A100

Same queue, same callback pattern, same dossier storage. Only the worker and selector change.

Appendix A: Prima Model Details

Property	Value
Paper	"Learning neuroimaging models from health system-scale data" (arXiv:2509.18638)
Training data	UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images
Architecture	Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer
Diagnoses covered	52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental
Mean AUC	90.1 ± 5.0% (prospective 30K study validation)
License	MIT
GPU requirement	Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100)
Weights	`primafullmodel107.pt` (~3.5GB) + `vqvae_model_step16799.pth` (~500MB)
Dependencies	PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49

Appendix B: DICOM Tags Reference

(0008,0008) ImageType          — ORIGINAL\PRIMARY vs DERIVED\SECONDARY
(0008,0060) Modality           — MR
(0008,103E) SeriesDescription  — Free text, scanner-dependent
(0010,0010) PatientName        — PHI — STRIP before sending to worker
(0010,0020) PatientID          — PHI — STRIP
(0010,0030) PatientBirthDate   — PHI — STRIP
(0018,0010) ContrastBolusAgent — Present = contrast-enhanced
(0018,0020) ScanningSequence   — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar)
(0018,0023) MRAcquisitionType  — 2D, 3D
(0018,0024) SequenceName       — Scanner-specific pulse sequence name
(0018,0050) SliceThickness     — mm
(0020,0011) SeriesNumber       — Integer ordering
(0020,0013) InstanceNumber     — Slice index within series
(0028,0010) Rows               — Pixel rows
(0028,0011) Columns            — Pixel columns

This specification is a living document. Update as implementation progresses and requirements evolve.

43 KiB Raw Blame History Unescape Escape