inou/specs/prima-integration.md

43 KiB
Raw Blame History

Prima Integration — Technical Specification

inou Health × Prima Brain MRI VLM Version: 1.0 · Date: 2026-02-14 · Author: James (AI Architect) · Reviewer: Johan Jongsma


Executive Summary

This spec defines how inou integrates Prima, the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations.

The design prioritizes cost efficiency (serverless GPU, intelligent series selection), security (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and seamless UX (upload → automatic analysis → results in viewer).

Key numbers:

  • Per-study cost with intelligent selection: $0.04$0.13 (vs $0.22$0.66 naive)
  • Cold start to results: 60120 seconds
  • Zero always-on GPU cost

Table of Contents

  1. System Architecture
  2. Intelligent Series Selection
  3. RunPod Serverless Worker
  4. API Design
  5. Docker Image
  6. Data Flow & Security
  7. Cost Analysis
  8. User Experience
  9. Error Handling
  10. Implementation Plan
  11. Future Work

1. System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         inou Frontend                               │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────────────────────┐ │
│  │  DICOM   │  │   DICOM      │  │   AI Results Panel            │ │
│  │  Upload  │──│   Viewer     │  │  • Diagnosis probabilities    │ │
│  │          │  │              │  │  • Urgency / priority         │ │
│  └────┬─────┘  └──────────────┘  │  • Specialist referral        │ │
│       │                          │  • Series-level annotations   │ │
│       │                          └──────────┬────────────────────┘ │
└───────┼─────────────────────────────────────┼─────────────────────┘
        │ POST /api/studies                   │ GET /api/studies/:id/analysis
        ▼                                     ▲
┌───────────────────────────────────────────────────────────────────┐
│                      inou Backend (Go)                            │
│                                                                   │
│  ┌──────────┐  ┌─────────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  DICOM   │  │   Series    │  │  Job     │  │   Dossier    │  │
│  │  Ingest  │─▶│  Selector   │─▶│  Queue   │  │   Store      │  │
│  │  + Parse │  │  (LLM/Rules)│  │          │  │  (SQLite+AES)│  │
│  └──────────┘  └─────────────┘  └────┬─────┘  └──────▲───────┘  │
│                                      │                │          │
└──────────────────────────────────────┼────────────────┼──────────┘
                                       │                │
                              Job dispatch        Results callback
                              (HTTPS POST)        (HTTPS POST + HMAC)
                                       │                │
                                       ▼                │
                    ┌──────────────────────────────────────────┐
                    │          RunPod Serverless                │
                    │                                          │
                    │  ┌────────────────────────────────────┐  │
                    │  │       Prima Worker (L40S 48GB)     │  │
                    │  │                                    │  │
                    │  │  1. Download DICOM from signed URL │  │
                    │  │  2. Load VQ-VAE tokenizer          │  │
                    │  │  3. Tokenize selected series       │  │
                    │  │  4. Free VQ-VAE from VRAM          │  │
                    │  │  5. Load Prima VLM                 │  │
                    │  │  6. Run inference (fp16)           │  │
                    │  │  7. POST results back to inou      │  │
                    │  │  8. Purge all DICOM data           │  │
                    │  │                                    │  │
                    │  └────────────────────────────────────┘  │
                    │                                          │
                    │  GPU: L40S 48GB · Scales 0→N · Pay/sec  │
                    └──────────────────────────────────────────┘

Component Responsibilities

Component Technology Responsibility
DICOM Ingest Go Parse uploaded DICOM, extract metadata, store encrypted
Series Selector Go + LLM (Claude Haiku) Analyze DICOM metadata, select diagnostically relevant series
Job Queue Go (in-process) Manage analysis jobs, retry logic, status tracking
RunPod Worker Python 3.11, PyTorch 2.6 Run Prima inference on selected series
Dossier Store SQLite + AES-256-GCM Store results encrypted at rest
AI Results Panel Frontend (existing viewer) Display diagnosis, urgency, referral

2. Intelligent Series Selection

The Problem

A typical brain MRI study contains 815 series with 100500 slices each, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 24 specific sequence types.

Selection Architecture

DICOM Study Metadata
        │
        ▼
┌───────────────────────────────────────────┐
│         Series Selection Pipeline          │
│                                            │
│  Step 1: Extract metadata per series       │
│    • SeriesDescription (0008,103E)         │
│    • SequenceName (0018,0024)              │
│    • MRAcquisitionType (0018,0023)         │
│    • ScanningSequence (0018,0020)          │
│    • ContrastBolusAgent (0018,0010)        │
│    • SliceThickness (0018,0050)            │
│    • NumberOfSlices                         │
│    • ImageType (0008,0008)                 │
│    • Modality (0008,0060)                  │
│                                            │
│  Step 2: Rule-based classification         │
│    Map each series → sequence type:        │
│    T1, T1+C, T2, T2-FLAIR, DWI, ADC,     │
│    SWI, MRA, SCOUT, LOCALIZER, DERIVED    │
│                                            │
│  Step 3: Clinical protocol matching        │
│    Apply selection rules (see table)       │
│                                            │
│  Step 4: LLM fallback for ambiguous cases  │
│    If rule-based classification fails,     │
│    send metadata to Claude Haiku           │
│                                            │
│  Output: ordered list of series to analyze │
└───────────────────────────────────────────┘

Rule-Based Classification

Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach:

type SeriesClassification struct {
    SeriesUID     string
    SequenceType  string   // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER
    HasContrast   bool
    IsLocalizer   bool
    IsDerived     bool
    SliceCount    int
    Confidence    float64  // 0.0-1.0, below 0.7 triggers LLM fallback
}

Classification rules (ordered by priority):

Signal Tag Example Values Maps To
ImageType contains "LOCALIZER" (0008,0008) ORIGINAL\PRIMARY\LOCALIZER SKIP
ImageType contains "DERIVED" (0008,0008) DERIVED\SECONDARY SKIP (usually)
SliceCount < 10 computed SKIP (scout/cal)
ContrastBolusAgent present (0018,0010) Gadavist +contrast flag
Description contains "flair" (0008,103E) AX T2 FLAIR, FLAIR_DARK-FLUID T2-FLAIR
Description contains "t2" (no flair) (0008,103E) AX T2, T2_TSE_TRA T2
Description contains "t1" (0008,103E) SAG T1, T1_MPRAGE T1 or T1+C
Description contains "dwi"/"diffusion" (0008,103E) DWI_b1000, EP2D_DIFF DWI
Description contains "adc" (0008,103E) ADC_MAP ADC
Description contains "swi"/"suscept" (0008,103E) SWI_mIP, SUSCEPTIBILITY SWI
ScanningSequence = "EP" + b-value tag (0018,0020) DWI
No match, confidence < 0.7 → LLM fallback

Clinical Protocol Selection

Given the classified series, select based on the clinical question (or use a general protocol if none specified):

Clinical Question Required Series Optional Series Expected Count
General screening T1, T2, T2-FLAIR DWI, T1+C 35
Hydrocephalus T2-FLAIR, T2 T1, DWI 23
Tumor / mass T1+C, T2-FLAIR, T1 (pre-contrast) DWI, SWI 34
Stroke / acute DWI, ADC, T2-FLAIR MRA, SWI 34
MS / demyelination T2-FLAIR, T1+C, T2 DWI 3
Infection T1+C, DWI, T2-FLAIR T2 3

Default behavior (no clinical question): Use "General screening" — select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum.

LLM Fallback

When rule-based classification confidence is below 0.7 for any series:

Prompt to Claude Haiku:

You are a neuroradiology MRI series classifier. Given the following DICOM 
metadata for a single MRI series, classify it.

SeriesDescription: {desc}
SequenceName: {seq_name}
ScanningSequence: {scan_seq}
MRAcquisitionType: {acq_type}
ImageType: {img_type}
ContrastBolusAgent: {contrast}
SliceCount: {count}
Manufacturer: {mfr}
ManufacturerModelName: {model}

Respond with JSON:
{
  "sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER",
  "has_contrast": true|false,
  "is_diagnostically_relevant": true|false,
  "confidence": 0.0-1.0,
  "reasoning": "brief explanation"
}

Cost of LLM fallback: ~$0.001 per series (Haiku). Triggered for maybe 12 series per study. Negligible.


3. RunPod Serverless Worker

Worker Architecture

The RunPod serverless worker wraps Prima's pipeline.py with an HTTP handler:

# handler.py — RunPod serverless handler
import runpod
import torch
import tempfile
import requests
import shutil
import json
import time
import gc
from pathlib import Path

from pipeline import Pipeline  # Prima's pipeline

def handler(job):
    """
    Input schema:
    {
        "input": {
            "dicom_url": "https://inou.../signed-download/...",
            "series_uids": ["1.2.840...", "1.2.840..."],
            "callback_url": "https://inou.../api/analysis/callback",
            "callback_hmac_key": "hex-encoded-key",
            "job_id": "uuid",
            "study_id": "uuid"
        }
    }
    """
    input_data = job["input"]
    work_dir = Path(tempfile.mkdtemp())
    
    try:
        # 1. Download DICOM (only selected series)
        t0 = time.time()
        dicom_path = download_dicom(input_data["dicom_url"], work_dir)
        
        # 2. Filter to selected series only
        filter_series(dicom_path, input_data["series_uids"])
        
        # 3. Run Prima pipeline
        config = build_config(dicom_path, work_dir / "output")
        pipeline = Pipeline(config)
        
        # Load study
        pipeline.load_mri_study()
        
        # Tokenize (VQ-VAE) — loads tokenizer, runs, then frees
        pipeline.load_tokenizer_model()
        tokens = pipeline.tokenize_series()
        pipeline.free_tokenizer()  # Free VRAM
        
        # Run Prima VLM
        pipeline.load_prima_model()
        results = pipeline.run_inference(tokens)
        pipeline.free_prima_model()
        
        elapsed = time.time() - t0
        
        # 4. Build response
        response = {
            "job_id": input_data["job_id"],
            "study_id": input_data["study_id"],
            "elapsed_seconds": elapsed,
            "series_analyzed": len(input_data["series_uids"]),
            "results": {
                "diagnoses": results["diagnoses"],       # list of {name, probability}
                "priority": results["priority"],          # STAT / URGENT / ROUTINE
                "referral": results["referral"],           # specialist recommendation
                "differential": results["differential"],   # ranked differential diagnosis
            }
        }
        
        # 5. POST results back to inou
        post_callback(input_data["callback_url"], response, 
                      input_data["callback_hmac_key"])
        
        return response
        
    finally:
        # 6. ALWAYS purge DICOM data — no PHI left on worker
        shutil.rmtree(work_dir, ignore_errors=True)
        torch.cuda.empty_cache()
        gc.collect()

runpod.serverless.start({"handler": handler})

Worker Lifecycle

                    RunPod Serverless
                    ─────────────────
Idle (no GPU)  ──────────────────────── $0.00/sec
                         │
                    Job arrives
                         │
                         ▼
Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S)
  • Container starts                     starts billing
  • Model weights loaded from
    network volume (NFS, ~10s)
  • GPU warm
                         │
                         ▼
Inference (~30-60s) ─────────────────── $0.00073/sec
  • VQ-VAE tokenization (~10-20s)
  • Free VQ-VAE
  • Prima VLM inference (~20-40s)
  • Results callback
                         │
                         ▼
Idle timeout (5s) ───────────────────── $0.00073/sec (configurable)
                         │
                         ▼
Scale to 0 ──────────────────────────── $0.00/sec

RunPod Configuration

{
    "name": "inou-prima-worker",
    "gpu": "NVIDIA L40S",
    "gpuCount": 1,
    "volumeId": "vol_prima_weights",
    "volumeMountPath": "/models",
    "dockerImage": "inou/prima-worker:latest",
    "env": {
        "MODEL_DIR": "/models",
        "PRIMA_CKPT": "/models/primafullmodel107.pt",
        "VQVAE_CKPT": "/models/vqvae_model_step16799.pth"
    },
    "scalerType": "QUEUE_DELAY",
    "scalerValue": 4,
    "workersMin": 0,
    "workersMax": 3,
    "idleTimeout": 5,
    "executionTimeout": 300,
    "flashboot": true
}

Network Volume: Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS — no download delay after first pull.


4. API Design

inou Backend Endpoints

4.1 Trigger Analysis

POST /api/studies/{studyID}/analyze
Authorization: Bearer <token>

Request Body (optional):
{
    "clinical_question": "evaluate for hydrocephalus",  // optional
    "priority": "routine",                               // routine | urgent | stat
    "force_series": ["1.2.840..."]                       // optional: override selection
}

Response 202:
{
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "selected_series": [
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2 FLAIR",
            "sequence_type": "T2-FLAIR",
            "slice_count": 28,
            "selection_reason": "Primary sequence for hydrocephalus evaluation"
        },
        {
            "series_uid": "1.2.840.113619...",
            "description": "AX T2",
            "sequence_type": "T2",
            "slice_count": 24,
            "selection_reason": "Complementary T2-weighted for ventricular assessment"
        }
    ],
    "estimated_cost_usd": 0.066,
    "estimated_duration_seconds": 90
}

4.2 Analysis Status

GET /api/studies/{studyID}/analysis
Authorization: Bearer <token>

Response 200:
{
    "job_id": "550e8400-...",
    "status": "completed",         // queued | processing | completed | failed
    "created_at": "2026-02-14T20:15:00Z",
    "completed_at": "2026-02-14T20:16:32Z",
    "duration_seconds": 92,
    "cost_usd": 0.067,
    "series_analyzed": 2,
    "results": { ... }             // see 4.3
}

4.3 Analysis Results

GET /api/studies/{studyID}/analysis/results
Authorization: Bearer <token>

Response 200:
{
    "study_id": "...",
    "model": "prima-v1.07",
    "model_version": "primafullmodel107",
    "analysis_timestamp": "2026-02-14T20:16:32Z",
    
    "diagnoses": [
        {
            "condition": "Normal pressure hydrocephalus",
            "icd10": "G91.2",
            "probability": 0.87,
            "category": "developmental"
        },
        {
            "condition": "Cerebral atrophy",
            "icd10": "G31.9",
            "probability": 0.34,
            "category": "degenerative"
        },
        {
            "condition": "Periventricular white matter changes",
            "icd10": "I67.3",
            "probability": 0.28,
            "category": "vascular"
        }
    ],
    
    "priority": {
        "level": "URGENT",
        "reasoning": "High probability hydrocephalus requiring neurosurgical evaluation"
    },
    
    "referral": {
        "specialty": "Neurosurgery",
        "urgency": "within_1_week",
        "reasoning": "NPH with high probability — consider VP shunt evaluation"
    },
    
    "differential": [
        "Normal pressure hydrocephalus",
        "Communicating hydrocephalus",
        "Cerebral atrophy (ex vacuo ventriculomegaly)"
    ],
    
    "series_results": [
        {
            "series_uid": "1.2.840...",
            "description": "AX T2 FLAIR",
            "findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow."
        }
    ],
    
    "disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation."
}

4.4 Callback Endpoint (Worker → inou)

POST /api/analysis/callback
X-HMAC-Signature: sha256=<hex>
Content-Type: application/json

{
    "job_id": "...",
    "study_id": "...",
    "status": "completed",
    "elapsed_seconds": 47.3,
    "series_analyzed": 2,
    "results": { ... }
}

HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks.

4.5 DICOM Download (Worker → inou)

GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2}
→ 200 application/zip (streaming)

Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs.
Only selected series are included in the download — minimizes data transfer.

5. Docker Image

Dockerfile

FROM nvidia/cuda:12.4.0-devel-ubuntu22.04

# System deps
RUN apt-get update && apt-get install -y \
    python3.11 python3.11-dev python3-pip \
    libgl1-mesa-glx libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3.11 /usr/bin/python

# Python deps (cached layer)
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt

# flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89)
RUN pip install flash-attn==2.7.4.post1 --no-build-isolation

# RunPod SDK
RUN pip install runpod==1.7.0

# Prima source code
COPY Prima/ /app/Prima/
COPY handler.py /app/handler.py

WORKDIR /app

# Model weights are on network volume, not baked into image
# /models/primafullmodel107.pt (~3.5GB)
# /models/vqvae_model_step16799.pth (~500MB)

CMD ["python", "handler.py"]

Image size: ~12GB (CUDA base + PyTorch + flash-attn + Prima)

What's in the image vs network volume:

Component Location Size Rationale
CUDA 12.4 + Ubuntu Docker image ~4GB Cached, rarely changes
PyTorch 2.6 + deps Docker image ~6GB Cached, rarely changes
flash-attn Docker image ~500MB Must match CUDA version
Prima source Docker image ~50MB Changes with updates
RunPod handler Docker image ~5KB Our wrapper code
primafullmodel107.pt Network volume ~3.5GB Too large for image, shared
vqvae_model_step16799.pth Network volume ~500MB Shared across workers

Build & Deploy

# Build
docker build -t inou/prima-worker:latest .

# Push to RunPod's registry (or Docker Hub)
docker push inou/prima-worker:latest

# Network volume setup (one-time)
runpod volume create --name prima-weights --size 10
# Upload model weights to volume via RunPod console or SSH

6. Data Flow & Security

Data Flow Diagram

   User uploads       inou stores          Worker downloads     Worker processes
   DICOM study        encrypted            selected series      and returns results
                      at rest              (signed URL)
        │                 │                      │                     │
        ▼                 ▼                      ▼                     ▼
   ┌─────────┐     ┌───────────┐          ┌──────────┐         ┌──────────┐
   │ Browser │────▶│ inou      │─────────▶│ RunPod   │────────▶│ inou     │
   │ Upload  │     │ Backend   │  signed  │ Worker   │ HMAC    │ Backend  │
   │ (TLS)   │     │ (AES-GCM) │  URL     │ (tmpfs)  │ callback│ (store)  │
   └─────────┘     └───────────┘          └──────────┘         └──────────┘
                                                │
                                          On completion:
                                          shutil.rmtree()
                                          No PHI retained

Security Controls

Concern Control
DICOM at rest (inou) AES-256-GCM encryption in SQLite (existing)
DICOM in transit to worker TLS 1.3 + time-limited signed URL (5 min TTL)
DICOM on worker Stored in tmpfs; shutil.rmtree() in finally block; container destroyed after job
Results in transit TLS 1.3 + HMAC-SHA256 callback verification
Results at rest Encrypted in dossier (existing AES-256-GCM)
PHI in logs No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata
RunPod access API key stored as inou backend env var, never exposed to frontend
Worker isolation Each job runs in isolated container; no shared filesystem between jobs

HIPAA Considerations

HIPAA Requirement Implementation
Access controls inou auth (existing); RunPod API key; signed URLs
Encryption AES-256-GCM at rest; TLS 1.3 in transit
Audit trail All analysis requests logged with timestamp, user, study ID
Minimum necessary Only selected series transmitted; no patient demographics sent to worker
BAA RunPod offers BAA for serverless — must execute before production
Data retention Zero retention on worker; configurable retention on inou
De-identification DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) — Series UID sufficient for processing

PHI Minimization Pipeline

Before sending DICOM to RunPod, inou strips the following tags:

var phiTagsToStrip = []dicom.Tag{
    dicom.PatientName,          // (0010,0010)
    dicom.PatientID,            // (0010,0020)  
    dicom.PatientBirthDate,     // (0010,0030)
    dicom.PatientSex,           // (0010,0040) — keep if Prima needs it
    dicom.PatientAddress,       // (0010,1040)
    dicom.ReferringPhysician,   // (0008,0090)
    dicom.InstitutionName,      // (0008,0080)
    dicom.InstitutionAddress,   // (0008,0081)
    dicom.AccessionNumber,      // (0008,0050)
}
// Pixel data and series-level imaging tags are preserved — Prima needs those

7. Cost Analysis

RunPod Pricing (L40S Serverless)

Metric Value
Per-second billing $0.00073/sec
Per-minute $0.0438/min
Per-hour $2.628/hr
Minimum charge per job None (per-second)
Network volume (10GB) ~$1.00/month
Idle workers $0.00 (scale to 0)

Per-Series Timing Breakdown

Phase Duration Notes
Cold start (first job) 1525s Container + model load from volume
Warm start (subsequent) 02s Worker already running
DICOM download 25s Depends on series size (~50-200MB)
VQ-VAE tokenization 1020s Per series, depends on slice count
VQ-VAE → Prima model swap 23s Free VQ-VAE, load Prima
Prima inference 1530s Depends on token count
Results callback <1s Small JSON payload
Total per series 3060s Warm: 3040s typical

Cost Per Study: With vs Without Intelligent Selection

Scenario A: Routine Brain MRI (12 series, ~3000 slices)

Approach Series Processed Est. Duration GPU Cost LLM Selection Cost Total
Naive (all series) 12 360720s $0.26$0.53 $0.00 $0.26$0.53
Intelligent (selected) 3 90180s $0.066$0.13 ~$0.003 $0.07$0.13
Savings 7375%

Scenario B: Brain MRI with Contrast (15 series, ~5000 slices)

Approach Series Processed Est. Duration GPU Cost LLM Selection Cost Total
Naive 15 450900s $0.33$0.66 $0.00 $0.33$0.66
Intelligent 4 120240s $0.088$0.18 ~$0.004 $0.09$0.18
Savings 73%

Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol)

Approach Series Processed Est. Duration GPU Cost Total
Naive 6 180360s $0.13$0.26 $0.13$0.26
Intelligent 3 90180s $0.066$0.13 $0.07$0.13

Monthly Volume Projections

Volume Naive Cost Intelligent Cost Monthly Savings
10 studies/month $3.30$6.60 $0.90$1.80 $2.40$4.80
50 studies/month $16.50$33.00 $4.50$9.00 $12.00$24.00
200 studies/month $66.00$132.00 $18.00$36.00 $48.00$96.00
1000 studies/month $330$660 $90$180 $240$480

Fixed costs: RunPod network volume ~$1/month. No other infrastructure costs — workers scale to zero.

Break-Even vs Always-On

An always-on L40S instance costs ~$0.69/hr = $500/month.

Break-even point: $500 ÷ $0.13/study = ~3,850 studies/month before always-on becomes cheaper.

For inou's expected volumes (10200/month), serverless is 50500× cheaper.


8. User Experience

Upload → Results Flow

┌─────────────────────────────────────────────────────────────┐
│ 1. UPLOAD                                                    │
│                                                              │
│  User uploads DICOM study (drag & drop or folder select)    │
│  inou parses metadata, displays series list in viewer       │
│  "AI Analysis Available" badge appears on brain MRI studies │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 2. ANALYSIS INITIATED                                        │
│                                                              │
│  [Analyze with AI] button in viewer toolbar                 │
│  Optional: clinical question text field                     │
│  "Analyzing 3 of 12 series... Est. ~90 seconds"            │
│  Progress indicator with phases:                            │
│    ◉ Series selected (3 of 12)                              │
│    ◉ Uploading to analysis engine...                        │
│    ○ AI processing...                                        │
│    ○ Results ready                                           │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│ 3. RESULTS DISPLAYED                                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ AI Analysis Results              Prima v1.07        │    │
│  │                                                     │    │
│  │ ⚠️  URGENT — Neurosurgery referral recommended      │    │
│  │                                                     │    │
│  │ Diagnoses:                                          │    │
│  │   ████████████████████░░  87%  NPH                  │    │
│  │   ██████░░░░░░░░░░░░░░░  34%  Cerebral atrophy     │    │
│  │   █████░░░░░░░░░░░░░░░░  28%  WM changes           │    │
│  │                                                     │    │
│  │ Referral: Neurosurgery (within 1 week)              │    │
│  │ "NPH with high probability — VP shunt evaluation"  │    │
│  │                                                     │    │
│  │ Series analyzed: T2 FLAIR, T2 (2 of 12)            │    │
│  │ Analysis time: 47s · Cost: $0.03                    │    │
│  │                                                     │    │
│  │ ⚕️ AI-assisted analysis — not a radiologist report  │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  Results panel integrated into existing DICOM viewer        │
│  Clicking a diagnosis highlights the relevant series        │
│  Results persisted in patient dossier                        │
└─────────────────────────────────────────────────────────────┘

Automatic vs Manual Analysis

Option A (default): Manual trigger — User clicks "Analyze with AI" button. Best for initial launch, gives user control.

Option B (future): Auto-trigger — Analysis starts automatically on upload for brain MRI studies. Configurable in settings.

Viewer Integration

The AI results panel is a new sidebar component in the existing DICOM viewer:

┌──────────────────────────────────────────────────────────────┐
│  inou Viewer                                          [≡]    │
├────────────────────────────────────┬─────────────────────────┤
│                                    │  📋 Study Info          │
│                                    │  Patient: [redacted]    │
│                                    │  Date: 2026-02-14       │
│        DICOM Image Display         │  Series: 12             │
│                                    │                         │
│      (existing viewer canvas)      ├─────────────────────────┤
│                                    │  🤖 AI Analysis         │
│                                    │                         │
│                                    │  Status: ✅ Complete     │
│                                    │  [Results panel above]  │
│                                    │                         │
│                                    │  [Re-analyze ▼]         │
│                                    │  [Export Report]        │
├────────────────────────────────────┴─────────────────────────┤
│  Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450           │
└──────────────────────────────────────────────────────────────┘

9. Error Handling

Error Categories & Recovery

Error Detection Recovery User Impact
RunPod cold start timeout No response in 60s Retry once; if fails, queue for retry in 5 min "Analysis queued — GPU starting up"
Worker execution timeout RunPod 300s timeout Job marked failed; auto-retry with reduced series count "Analysis timed out — retrying with fewer series"
Model inference error Exception in handler Return error in callback; log stack trace "Analysis failed — try again or contact support"
DICOM download failure HTTP error / timeout Retry download 3× with exponential backoff Transparent to user
Callback delivery failure HTTP error Worker retries 3×; inou polls RunPod status API as fallback Transparent — results arrive via polling
Invalid DICOM study Parse errors in ingest Reject at upload time with specific error "Unable to parse series — unsupported format"
No relevant series found Series selector returns empty Skip analysis; inform user "No brain MRI sequences detected in this study"
RunPod quota / billing 402/429 errors Alert admin; queue jobs for later "Analysis temporarily unavailable"
Network volume unavailable Model load failure Worker retries; if persistent, alert "Analysis service maintenance"

Retry Strategy

type RetryPolicy struct {
    MaxAttempts     int           // 3
    InitialBackoff  time.Duration // 10s
    MaxBackoff      time.Duration // 5m
    BackoffFactor   float64       // 2.0
    RetryableErrors []string      // timeout, 5xx, connection_refused
}

Monitoring & Alerting

Metric Alert Threshold Channel
Job failure rate >10% in 1 hour Slack / Signal
Median latency >180s Dashboard
RunPod spend >$50/day Email
Cold start frequency >50% of jobs Dashboard (indicates insufficient idle timeout)
Callback failures Any Immediate alert

10. Implementation Plan

Phase 1: Foundation (2 weeks)

  • Fork Prima repo, create inou-prima-worker repo
  • Write RunPod handler (handler.py)
  • Build Docker image, test locally with nvidia-docker
  • Set up RunPod account, network volume, deploy worker
  • Test end-to-end with a sample DICOM study
  • Verify model outputs match Prima's reference results

Phase 2: Series Selection (1 week)

  • Implement rule-based series classifier in Go
  • Build clinical protocol selection logic
  • Add LLM fallback (Claude Haiku integration)
  • Test with diverse DICOM studies (Siemens, GE, Philips naming conventions)
  • Validate selection accuracy: should match radiologist series choice >90%

Phase 3: Backend Integration (2 weeks)

  • Add analysis endpoints to inou backend
  • Implement job queue with retry logic
  • Build DICOM download endpoint with signed URLs
  • Implement callback handler with HMAC verification
  • Add PHI stripping to DICOM export pipeline
  • Store results in dossier (encrypted)
  • Write integration tests

Phase 4: Frontend Integration (1 week)

  • Add "Analyze with AI" button to viewer
  • Build AI results panel component
  • Implement progress indicator (WebSocket or polling)
  • Add results to dossier view
  • Export report functionality

Phase 5: Hardening (1 week)

  • Load testing (concurrent jobs, cold starts)
  • Error handling for all failure modes
  • Monitoring dashboard
  • Cost tracking and alerting
  • Security review (PHI flow, access controls)
  • Documentation

Total: ~7 weeks to production-ready MVP.


11. Future Work

Near-Term (36 months)

  • Auto-trigger analysis on brain MRI upload (configurable)
  • Batch processing — analyze multiple studies in parallel
  • Result caching — skip re-analysis if study hasn't changed
  • Comparison mode — compare current vs prior study results

Medium-Term (612 months)

  • Fine-tuning on specific conditions — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case)
  • Radiologist feedback loop — Allow radiologists to confirm/reject AI findings, feeding back into training data
  • Multi-series reasoning — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1)

Long-Term (12+ months)

  • Multi-organ expansion — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture
  • On-premise deployment — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU)
  • Real-time inference — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration)
  • Clinical decision support — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations
  • FDA 510(k) pathway — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device

Architecture Extensibility

The serverless worker pattern is model-agnostic. To add a new model:

  1. Build a new Docker image with the model
  2. Deploy as a separate RunPod serverless endpoint
  3. Add a new series selector profile
  4. Route from inou backend based on study type / body part
                    inou Backend
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
         Prima       SpineVLM    CardiacAI
        (Brain)      (Spine)     (Heart)
         L40S         L40S        A100

Same queue, same callback pattern, same dossier storage. Only the worker and selector change.


Appendix A: Prima Model Details

Property Value
Paper "Learning neuroimaging models from health system-scale data" (arXiv:2509.18638)
Training data UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images
Architecture Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer
Diagnoses covered 52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental
Mean AUC 90.1 ± 5.0% (prospective 30K study validation)
License MIT
GPU requirement Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100)
Weights primafullmodel107.pt (~3.5GB) + vqvae_model_step16799.pth (~500MB)
Dependencies PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49

Appendix B: DICOM Tags Reference

(0008,0008) ImageType          — ORIGINAL\PRIMARY vs DERIVED\SECONDARY
(0008,0060) Modality           — MR
(0008,103E) SeriesDescription  — Free text, scanner-dependent
(0010,0010) PatientName        — PHI — STRIP before sending to worker
(0010,0020) PatientID          — PHI — STRIP
(0010,0030) PatientBirthDate   — PHI — STRIP
(0018,0010) ContrastBolusAgent — Present = contrast-enhanced
(0018,0020) ScanningSequence   — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar)
(0018,0023) MRAcquisitionType  — 2D, 3D
(0018,0024) SequenceName       — Scanner-specific pulse sequence name
(0018,0050) SliceThickness     — mm
(0020,0011) SeriesNumber       — Integer ordering
(0020,0013) InstanceNumber     — Slice index within series
(0028,0010) Rows               — Pixel rows
(0028,0011) Columns            — Pixel columns

This specification is a living document. Update as implementation progresses and requirements evolve.