# Prima Integration — Technical Specification **inou Health × Prima Brain MRI VLM** **Version:** 1.0 · **Date:** 2026-02-14 · **Author:** James (AI Architect) · **Reviewer:** Johan Jongsma --- ## Executive Summary This spec defines how inou integrates [Prima](https://github.com/MLNeurosurg/Prima), the University of Michigan's brain MRI Vision Language Model, as an on-demand diagnostic AI service. Prima achieves 90.1% mean AUC across 52 neurological diagnoses, offers differential diagnosis, worklist prioritization, and specialist referral recommendations. The design prioritizes **cost efficiency** (serverless GPU, intelligent series selection), **security** (zero PHI at rest on cloud GPU, AES-256-GCM encrypted dossier storage), and **seamless UX** (upload → automatic analysis → results in viewer). **Key numbers:** - Per-study cost with intelligent selection: **$0.04–$0.13** (vs $0.22–$0.66 naive) - Cold start to results: **60–120 seconds** - Zero always-on GPU cost --- ## Table of Contents 1. [System Architecture](#1-system-architecture) 2. [Intelligent Series Selection](#2-intelligent-series-selection) 3. [RunPod Serverless Worker](#3-runpod-serverless-worker) 4. [API Design](#4-api-design) 5. [Docker Image](#5-docker-image) 6. [Data Flow & Security](#6-data-flow--security) 7. [Cost Analysis](#7-cost-analysis) 8. [User Experience](#8-user-experience) 9. [Error Handling](#9-error-handling) 10. [Implementation Plan](#10-implementation-plan) 11. [Future Work](#11-future-work) --- ## 1. System Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ inou Frontend │ │ ┌──────────┐ ┌──────────────┐ ┌───────────────────────────────┐ │ │ │ DICOM │ │ DICOM │ │ AI Results Panel │ │ │ │ Upload │──│ Viewer │ │ • Diagnosis probabilities │ │ │ │ │ │ │ │ • Urgency / priority │ │ │ └────┬─────┘ └──────────────┘ │ • Specialist referral │ │ │ │ │ • Series-level annotations │ │ │ │ └──────────┬────────────────────┘ │ └───────┼─────────────────────────────────────┼─────────────────────┘ │ POST /api/studies │ GET /api/studies/:id/analysis ▼ ▲ ┌───────────────────────────────────────────────────────────────────┐ │ inou Backend (Go) │ │ │ │ ┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌──────────────┐ │ │ │ DICOM │ │ Series │ │ Job │ │ Dossier │ │ │ │ Ingest │─▶│ Selector │─▶│ Queue │ │ Store │ │ │ │ + Parse │ │ (LLM/Rules)│ │ │ │ (SQLite+AES)│ │ │ └──────────┘ └─────────────┘ └────┬─────┘ └──────▲───────┘ │ │ │ │ │ └──────────────────────────────────────┼────────────────┼──────────┘ │ │ Job dispatch Results callback (HTTPS POST) (HTTPS POST + HMAC) │ │ ▼ │ ┌──────────────────────────────────────────┐ │ RunPod Serverless │ │ │ │ ┌────────────────────────────────────┐ │ │ │ Prima Worker (L40S 48GB) │ │ │ │ │ │ │ │ 1. Download DICOM from signed URL │ │ │ │ 2. Load VQ-VAE tokenizer │ │ │ │ 3. Tokenize selected series │ │ │ │ 4. Free VQ-VAE from VRAM │ │ │ │ 5. Load Prima VLM │ │ │ │ 6. Run inference (fp16) │ │ │ │ 7. POST results back to inou │ │ │ │ 8. Purge all DICOM data │ │ │ │ │ │ │ └────────────────────────────────────┘ │ │ │ │ GPU: L40S 48GB · Scales 0→N · Pay/sec │ └──────────────────────────────────────────┘ ``` ### Component Responsibilities | Component | Technology | Responsibility | |-----------|-----------|----------------| | **DICOM Ingest** | Go | Parse uploaded DICOM, extract metadata, store encrypted | | **Series Selector** | Go + LLM (Claude Haiku) | Analyze DICOM metadata, select diagnostically relevant series | | **Job Queue** | Go (in-process) | Manage analysis jobs, retry logic, status tracking | | **RunPod Worker** | Python 3.11, PyTorch 2.6 | Run Prima inference on selected series | | **Dossier Store** | SQLite + AES-256-GCM | Store results encrypted at rest | | **AI Results Panel** | Frontend (existing viewer) | Display diagnosis, urgency, referral | --- ## 2. Intelligent Series Selection ### The Problem A typical brain MRI study contains **8–15 series** with **100–500 slices each**, potentially 10,000+ total slices. Running every series through Prima wastes GPU time and money. Most clinical questions only need 2–4 specific sequence types. ### Selection Architecture ``` DICOM Study Metadata │ ▼ ┌───────────────────────────────────────────┐ │ Series Selection Pipeline │ │ │ │ Step 1: Extract metadata per series │ │ • SeriesDescription (0008,103E) │ │ • SequenceName (0018,0024) │ │ • MRAcquisitionType (0018,0023) │ │ • ScanningSequence (0018,0020) │ │ • ContrastBolusAgent (0018,0010) │ │ • SliceThickness (0018,0050) │ │ • NumberOfSlices │ │ • ImageType (0008,0008) │ │ • Modality (0008,0060) │ │ │ │ Step 2: Rule-based classification │ │ Map each series → sequence type: │ │ T1, T1+C, T2, T2-FLAIR, DWI, ADC, │ │ SWI, MRA, SCOUT, LOCALIZER, DERIVED │ │ │ │ Step 3: Clinical protocol matching │ │ Apply selection rules (see table) │ │ │ │ Step 4: LLM fallback for ambiguous cases │ │ If rule-based classification fails, │ │ send metadata to Claude Haiku │ │ │ │ Output: ordered list of series to analyze │ └───────────────────────────────────────────┘ ``` ### Rule-Based Classification Series descriptions are notoriously inconsistent across scanners (Siemens, GE, Philips all use different naming). The classifier uses a multi-signal approach: ```go type SeriesClassification struct { SeriesUID string SequenceType string // T1, T2, FLAIR, T1C, DWI, ADC, SWI, MRA, OTHER HasContrast bool IsLocalizer bool IsDerived bool SliceCount int Confidence float64 // 0.0-1.0, below 0.7 triggers LLM fallback } ``` **Classification rules (ordered by priority):** | Signal | Tag | Example Values | Maps To | |--------|-----|---------------|---------| | ImageType contains "LOCALIZER" | (0008,0008) | `ORIGINAL\PRIMARY\LOCALIZER` | SKIP | | ImageType contains "DERIVED" | (0008,0008) | `DERIVED\SECONDARY` | SKIP (usually) | | SliceCount < 10 | computed | — | SKIP (scout/cal) | | ContrastBolusAgent present | (0018,0010) | `Gadavist` | +contrast flag | | Description contains "flair" | (0008,103E) | `AX T2 FLAIR`, `FLAIR_DARK-FLUID` | T2-FLAIR | | Description contains "t2" (no flair) | (0008,103E) | `AX T2`, `T2_TSE_TRA` | T2 | | Description contains "t1" | (0008,103E) | `SAG T1`, `T1_MPRAGE` | T1 or T1+C | | Description contains "dwi"/"diffusion" | (0008,103E) | `DWI_b1000`, `EP2D_DIFF` | DWI | | Description contains "adc" | (0008,103E) | `ADC_MAP` | ADC | | Description contains "swi"/"suscept" | (0008,103E) | `SWI_mIP`, `SUSCEPTIBILITY` | SWI | | ScanningSequence = "EP" + b-value tag | (0018,0020) | — | DWI | | No match, confidence < 0.7 | — | — | → LLM fallback | ### Clinical Protocol Selection Given the classified series, select based on the clinical question (or use a general protocol if none specified): | Clinical Question | Required Series | Optional Series | Expected Count | |-------------------|-----------------|-----------------|----------------| | **General screening** | T1, T2, T2-FLAIR | DWI, T1+C | 3–5 | | **Hydrocephalus** | T2-FLAIR, T2 | T1, DWI | 2–3 | | **Tumor / mass** | T1+C, T2-FLAIR, T1 (pre-contrast) | DWI, SWI | 3–4 | | **Stroke / acute** | DWI, ADC, T2-FLAIR | MRA, SWI | 3–4 | | **MS / demyelination** | T2-FLAIR, T1+C, T2 | DWI | 3 | | **Infection** | T1+C, DWI, T2-FLAIR | T2 | 3 | **Default behavior (no clinical question):** Use "General screening" — select T1, T2, T2-FLAIR, and any contrast-enhanced series. Cap at 5 series maximum. ### LLM Fallback When rule-based classification confidence is below 0.7 for any series: ``` Prompt to Claude Haiku: You are a neuroradiology MRI series classifier. Given the following DICOM metadata for a single MRI series, classify it. SeriesDescription: {desc} SequenceName: {seq_name} ScanningSequence: {scan_seq} MRAcquisitionType: {acq_type} ImageType: {img_type} ContrastBolusAgent: {contrast} SliceCount: {count} Manufacturer: {mfr} ManufacturerModelName: {model} Respond with JSON: { "sequence_type": "T1|T2|T2-FLAIR|T1+C|DWI|ADC|SWI|MRA|SCOUT|OTHER", "has_contrast": true|false, "is_diagnostically_relevant": true|false, "confidence": 0.0-1.0, "reasoning": "brief explanation" } ``` **Cost of LLM fallback:** ~$0.001 per series (Haiku). Triggered for maybe 1–2 series per study. Negligible. --- ## 3. RunPod Serverless Worker ### Worker Architecture The RunPod serverless worker wraps Prima's `pipeline.py` with an HTTP handler: ```python # handler.py — RunPod serverless handler import runpod import torch import tempfile import requests import shutil import json import time import gc from pathlib import Path from pipeline import Pipeline # Prima's pipeline def handler(job): """ Input schema: { "input": { "dicom_url": "https://inou.../signed-download/...", "series_uids": ["1.2.840...", "1.2.840..."], "callback_url": "https://inou.../api/analysis/callback", "callback_hmac_key": "hex-encoded-key", "job_id": "uuid", "study_id": "uuid" } } """ input_data = job["input"] work_dir = Path(tempfile.mkdtemp()) try: # 1. Download DICOM (only selected series) t0 = time.time() dicom_path = download_dicom(input_data["dicom_url"], work_dir) # 2. Filter to selected series only filter_series(dicom_path, input_data["series_uids"]) # 3. Run Prima pipeline config = build_config(dicom_path, work_dir / "output") pipeline = Pipeline(config) # Load study pipeline.load_mri_study() # Tokenize (VQ-VAE) — loads tokenizer, runs, then frees pipeline.load_tokenizer_model() tokens = pipeline.tokenize_series() pipeline.free_tokenizer() # Free VRAM # Run Prima VLM pipeline.load_prima_model() results = pipeline.run_inference(tokens) pipeline.free_prima_model() elapsed = time.time() - t0 # 4. Build response response = { "job_id": input_data["job_id"], "study_id": input_data["study_id"], "elapsed_seconds": elapsed, "series_analyzed": len(input_data["series_uids"]), "results": { "diagnoses": results["diagnoses"], # list of {name, probability} "priority": results["priority"], # STAT / URGENT / ROUTINE "referral": results["referral"], # specialist recommendation "differential": results["differential"], # ranked differential diagnosis } } # 5. POST results back to inou post_callback(input_data["callback_url"], response, input_data["callback_hmac_key"]) return response finally: # 6. ALWAYS purge DICOM data — no PHI left on worker shutil.rmtree(work_dir, ignore_errors=True) torch.cuda.empty_cache() gc.collect() runpod.serverless.start({"handler": handler}) ``` ### Worker Lifecycle ``` RunPod Serverless ───────────────── Idle (no GPU) ──────────────────────── $0.00/sec │ Job arrives │ ▼ Cold Start (~15-25s) ────────────────── $0.00073/sec (L40S) • Container starts starts billing • Model weights loaded from network volume (NFS, ~10s) • GPU warm │ ▼ Inference (~30-60s) ─────────────────── $0.00073/sec • VQ-VAE tokenization (~10-20s) • Free VQ-VAE • Prima VLM inference (~20-40s) • Results callback │ ▼ Idle timeout (5s) ───────────────────── $0.00073/sec (configurable) │ ▼ Scale to 0 ──────────────────────────── $0.00/sec ``` ### RunPod Configuration ```json { "name": "inou-prima-worker", "gpu": "NVIDIA L40S", "gpuCount": 1, "volumeId": "vol_prima_weights", "volumeMountPath": "/models", "dockerImage": "inou/prima-worker:latest", "env": { "MODEL_DIR": "/models", "PRIMA_CKPT": "/models/primafullmodel107.pt", "VQVAE_CKPT": "/models/vqvae_model_step16799.pth" }, "scalerType": "QUEUE_DELAY", "scalerValue": 4, "workersMin": 0, "workersMax": 3, "idleTimeout": 5, "executionTimeout": 300, "flashboot": true } ``` **Network Volume:** Model weights (~4GB total) stored on a RunPod network volume. Persists across cold starts. Mounts as NFS — no download delay after first pull. --- ## 4. API Design ### inou Backend Endpoints #### 4.1 Trigger Analysis ``` POST /api/studies/{studyID}/analyze Authorization: Bearer Request Body (optional): { "clinical_question": "evaluate for hydrocephalus", // optional "priority": "routine", // routine | urgent | stat "force_series": ["1.2.840..."] // optional: override selection } Response 202: { "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued", "selected_series": [ { "series_uid": "1.2.840.113619...", "description": "AX T2 FLAIR", "sequence_type": "T2-FLAIR", "slice_count": 28, "selection_reason": "Primary sequence for hydrocephalus evaluation" }, { "series_uid": "1.2.840.113619...", "description": "AX T2", "sequence_type": "T2", "slice_count": 24, "selection_reason": "Complementary T2-weighted for ventricular assessment" } ], "estimated_cost_usd": 0.066, "estimated_duration_seconds": 90 } ``` #### 4.2 Analysis Status ``` GET /api/studies/{studyID}/analysis Authorization: Bearer Response 200: { "job_id": "550e8400-...", "status": "completed", // queued | processing | completed | failed "created_at": "2026-02-14T20:15:00Z", "completed_at": "2026-02-14T20:16:32Z", "duration_seconds": 92, "cost_usd": 0.067, "series_analyzed": 2, "results": { ... } // see 4.3 } ``` #### 4.3 Analysis Results ``` GET /api/studies/{studyID}/analysis/results Authorization: Bearer Response 200: { "study_id": "...", "model": "prima-v1.07", "model_version": "primafullmodel107", "analysis_timestamp": "2026-02-14T20:16:32Z", "diagnoses": [ { "condition": "Normal pressure hydrocephalus", "icd10": "G91.2", "probability": 0.87, "category": "developmental" }, { "condition": "Cerebral atrophy", "icd10": "G31.9", "probability": 0.34, "category": "degenerative" }, { "condition": "Periventricular white matter changes", "icd10": "I67.3", "probability": 0.28, "category": "vascular" } ], "priority": { "level": "URGENT", "reasoning": "High probability hydrocephalus requiring neurosurgical evaluation" }, "referral": { "specialty": "Neurosurgery", "urgency": "within_1_week", "reasoning": "NPH with high probability — consider VP shunt evaluation" }, "differential": [ "Normal pressure hydrocephalus", "Communicating hydrocephalus", "Cerebral atrophy (ex vacuo ventriculomegaly)" ], "series_results": [ { "series_uid": "1.2.840...", "description": "AX T2 FLAIR", "findings": "Disproportionate ventriculomegaly relative to sulcal enlargement. Periventricular signal abnormality consistent with transependymal CSF flow." } ], "disclaimer": "AI-generated analysis for clinical decision support only. Not a substitute for radiologist interpretation." } ``` #### 4.4 Callback Endpoint (Worker → inou) ``` POST /api/analysis/callback X-HMAC-Signature: sha256= Content-Type: application/json { "job_id": "...", "study_id": "...", "status": "completed", "elapsed_seconds": 47.3, "series_analyzed": 2, "results": { ... } } ``` HMAC signature computed over the raw JSON body using a per-job secret. Prevents spoofed callbacks. #### 4.5 DICOM Download (Worker → inou) ``` GET /api/internal/dicom/{studyID}/download?token={signed_token}&series={uid1,uid2} → 200 application/zip (streaming) Token: time-limited (5 min), signed with HMAC, includes study ID and allowed series UIDs. Only selected series are included in the download — minimizes data transfer. ``` --- ## 5. Docker Image ### Dockerfile ```dockerfile FROM nvidia/cuda:12.4.0-devel-ubuntu22.04 # System deps RUN apt-get update && apt-get install -y \ python3.11 python3.11-dev python3-pip \ libgl1-mesa-glx libglib2.0-0 \ && rm -rf /var/lib/apt/lists/* RUN ln -s /usr/bin/python3.11 /usr/bin/python # Python deps (cached layer) COPY requirements.txt /tmp/ RUN pip install --no-cache-dir -r /tmp/requirements.txt # flash-attn requires Ampere+ (sm_80+), build for L40S (sm_89) RUN pip install flash-attn==2.7.4.post1 --no-build-isolation # RunPod SDK RUN pip install runpod==1.7.0 # Prima source code COPY Prima/ /app/Prima/ COPY handler.py /app/handler.py WORKDIR /app # Model weights are on network volume, not baked into image # /models/primafullmodel107.pt (~3.5GB) # /models/vqvae_model_step16799.pth (~500MB) CMD ["python", "handler.py"] ``` **Image size:** ~12GB (CUDA base + PyTorch + flash-attn + Prima) ### What's in the image vs network volume: | Component | Location | Size | Rationale | |-----------|----------|------|-----------| | CUDA 12.4 + Ubuntu | Docker image | ~4GB | Cached, rarely changes | | PyTorch 2.6 + deps | Docker image | ~6GB | Cached, rarely changes | | flash-attn | Docker image | ~500MB | Must match CUDA version | | Prima source | Docker image | ~50MB | Changes with updates | | RunPod handler | Docker image | ~5KB | Our wrapper code | | `primafullmodel107.pt` | Network volume | ~3.5GB | Too large for image, shared | | `vqvae_model_step16799.pth` | Network volume | ~500MB | Shared across workers | ### Build & Deploy ```bash # Build docker build -t inou/prima-worker:latest . # Push to RunPod's registry (or Docker Hub) docker push inou/prima-worker:latest # Network volume setup (one-time) runpod volume create --name prima-weights --size 10 # Upload model weights to volume via RunPod console or SSH ``` --- ## 6. Data Flow & Security ### Data Flow Diagram ``` User uploads inou stores Worker downloads Worker processes DICOM study encrypted selected series and returns results at rest (signed URL) │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ │ Browser │────▶│ inou │─────────▶│ RunPod │────────▶│ inou │ │ Upload │ │ Backend │ signed │ Worker │ HMAC │ Backend │ │ (TLS) │ │ (AES-GCM) │ URL │ (tmpfs) │ callback│ (store) │ └─────────┘ └───────────┘ └──────────┘ └──────────┘ │ On completion: shutil.rmtree() No PHI retained ``` ### Security Controls | Concern | Control | |---------|---------| | **DICOM at rest (inou)** | AES-256-GCM encryption in SQLite (existing) | | **DICOM in transit to worker** | TLS 1.3 + time-limited signed URL (5 min TTL) | | **DICOM on worker** | Stored in tmpfs; `shutil.rmtree()` in `finally` block; container destroyed after job | | **Results in transit** | TLS 1.3 + HMAC-SHA256 callback verification | | **Results at rest** | Encrypted in dossier (existing AES-256-GCM) | | **PHI in logs** | No DICOM pixel data or patient identifiers logged. Only series UIDs and job metadata | | **RunPod access** | API key stored as inou backend env var, never exposed to frontend | | **Worker isolation** | Each job runs in isolated container; no shared filesystem between jobs | ### HIPAA Considerations | HIPAA Requirement | Implementation | |-------------------|----------------| | **Access controls** | inou auth (existing); RunPod API key; signed URLs | | **Encryption** | AES-256-GCM at rest; TLS 1.3 in transit | | **Audit trail** | All analysis requests logged with timestamp, user, study ID | | **Minimum necessary** | Only selected series transmitted; no patient demographics sent to worker | | **BAA** | RunPod offers BAA for serverless — **must execute before production** | | **Data retention** | Zero retention on worker; configurable retention on inou | | **De-identification** | DICOM sent to worker can be stripped of patient name/DOB (tags 0010,0010 / 0010,0030) — Series UID sufficient for processing | ### PHI Minimization Pipeline Before sending DICOM to RunPod, inou strips the following tags: ```go var phiTagsToStrip = []dicom.Tag{ dicom.PatientName, // (0010,0010) dicom.PatientID, // (0010,0020) dicom.PatientBirthDate, // (0010,0030) dicom.PatientSex, // (0010,0040) — keep if Prima needs it dicom.PatientAddress, // (0010,1040) dicom.ReferringPhysician, // (0008,0090) dicom.InstitutionName, // (0008,0080) dicom.InstitutionAddress, // (0008,0081) dicom.AccessionNumber, // (0008,0050) } // Pixel data and series-level imaging tags are preserved — Prima needs those ``` --- ## 7. Cost Analysis ### RunPod Pricing (L40S Serverless) | Metric | Value | |--------|-------| | Per-second billing | $0.00073/sec | | Per-minute | $0.0438/min | | Per-hour | $2.628/hr | | Minimum charge per job | None (per-second) | | Network volume (10GB) | ~$1.00/month | | Idle workers | $0.00 (scale to 0) | ### Per-Series Timing Breakdown | Phase | Duration | Notes | |-------|----------|-------| | Cold start (first job) | 15–25s | Container + model load from volume | | Warm start (subsequent) | 0–2s | Worker already running | | DICOM download | 2–5s | Depends on series size (~50-200MB) | | VQ-VAE tokenization | 10–20s | Per series, depends on slice count | | VQ-VAE → Prima model swap | 2–3s | Free VQ-VAE, load Prima | | Prima inference | 15–30s | Depends on token count | | Results callback | <1s | Small JSON payload | | **Total per series** | **30–60s** | **Warm: 30–40s typical** | ### Cost Per Study: With vs Without Intelligent Selection #### Scenario A: Routine Brain MRI (12 series, ~3000 slices) | Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total | |----------|-----------------|---------------|----------|-------------------|-------| | **Naive (all series)** | 12 | 360–720s | $0.26–$0.53 | $0.00 | **$0.26–$0.53** | | **Intelligent (selected)** | 3 | 90–180s | $0.066–$0.13 | ~$0.003 | **$0.07–$0.13** | | **Savings** | — | — | — | — | **73–75%** | #### Scenario B: Brain MRI with Contrast (15 series, ~5000 slices) | Approach | Series Processed | Est. Duration | GPU Cost | LLM Selection Cost | Total | |----------|-----------------|---------------|----------|-------------------|-------| | **Naive** | 15 | 450–900s | $0.33–$0.66 | $0.00 | **$0.33–$0.66** | | **Intelligent** | 4 | 120–240s | $0.088–$0.18 | ~$0.004 | **$0.09–$0.18** | | **Savings** | — | — | — | — | **73%** | #### Scenario C: Focused Study (6 series, ~1500 slices, e.g., stroke protocol) | Approach | Series Processed | Est. Duration | GPU Cost | Total | |----------|-----------------|---------------|----------|-------| | **Naive** | 6 | 180–360s | $0.13–$0.26 | **$0.13–$0.26** | | **Intelligent** | 3 | 90–180s | $0.066–$0.13 | **$0.07–$0.13** | ### Monthly Volume Projections | Volume | Naive Cost | Intelligent Cost | Monthly Savings | |--------|-----------|-----------------|-----------------| | 10 studies/month | $3.30–$6.60 | $0.90–$1.80 | $2.40–$4.80 | | 50 studies/month | $16.50–$33.00 | $4.50–$9.00 | $12.00–$24.00 | | 200 studies/month | $66.00–$132.00 | $18.00–$36.00 | $48.00–$96.00 | | 1000 studies/month | $330–$660 | $90–$180 | $240–$480 | **Fixed costs:** RunPod network volume ~$1/month. No other infrastructure costs — workers scale to zero. ### Break-Even vs Always-On An always-on L40S instance costs ~$0.69/hr = **$500/month**. Break-even point: $500 ÷ $0.13/study = **~3,850 studies/month** before always-on becomes cheaper. For inou's expected volumes (10–200/month), **serverless is 50–500× cheaper**. --- ## 8. User Experience ### Upload → Results Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ 1. UPLOAD │ │ │ │ User uploads DICOM study (drag & drop or folder select) │ │ inou parses metadata, displays series list in viewer │ │ "AI Analysis Available" badge appears on brain MRI studies │ │ │ ├─────────────────────────────────────────────────────────────┤ │ 2. ANALYSIS INITIATED │ │ │ │ [Analyze with AI] button in viewer toolbar │ │ Optional: clinical question text field │ │ "Analyzing 3 of 12 series... Est. ~90 seconds" │ │ Progress indicator with phases: │ │ ◉ Series selected (3 of 12) │ │ ◉ Uploading to analysis engine... │ │ ○ AI processing... │ │ ○ Results ready │ │ │ ├─────────────────────────────────────────────────────────────┤ │ 3. RESULTS DISPLAYED │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ AI Analysis Results Prima v1.07 │ │ │ │ │ │ │ │ ⚠️ URGENT — Neurosurgery referral recommended │ │ │ │ │ │ │ │ Diagnoses: │ │ │ │ ████████████████████░░ 87% NPH │ │ │ │ ██████░░░░░░░░░░░░░░░ 34% Cerebral atrophy │ │ │ │ █████░░░░░░░░░░░░░░░░ 28% WM changes │ │ │ │ │ │ │ │ Referral: Neurosurgery (within 1 week) │ │ │ │ "NPH with high probability — VP shunt evaluation" │ │ │ │ │ │ │ │ Series analyzed: T2 FLAIR, T2 (2 of 12) │ │ │ │ Analysis time: 47s · Cost: $0.03 │ │ │ │ │ │ │ │ ⚕️ AI-assisted analysis — not a radiologist report │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ Results panel integrated into existing DICOM viewer │ │ Clicking a diagnosis highlights the relevant series │ │ Results persisted in patient dossier │ └─────────────────────────────────────────────────────────────┘ ``` ### Automatic vs Manual Analysis **Option A (default): Manual trigger** — User clicks "Analyze with AI" button. Best for initial launch, gives user control. **Option B (future): Auto-trigger** — Analysis starts automatically on upload for brain MRI studies. Configurable in settings. ### Viewer Integration The AI results panel is a new sidebar component in the existing DICOM viewer: ``` ┌──────────────────────────────────────────────────────────────┐ │ inou Viewer [≡] │ ├────────────────────────────────────┬─────────────────────────┤ │ │ 📋 Study Info │ │ │ Patient: [redacted] │ │ │ Date: 2026-02-14 │ │ DICOM Image Display │ Series: 12 │ │ │ │ │ (existing viewer canvas) ├─────────────────────────┤ │ │ 🤖 AI Analysis │ │ │ │ │ │ Status: ✅ Complete │ │ │ [Results panel above] │ │ │ │ │ │ [Re-analyze ▼] │ │ │ [Export Report] │ ├────────────────────────────────────┴─────────────────────────┤ │ Series: AX T2 FLAIR | Slice 14/28 | W:1500 L:450 │ └──────────────────────────────────────────────────────────────┘ ``` --- ## 9. Error Handling ### Error Categories & Recovery | Error | Detection | Recovery | User Impact | |-------|-----------|----------|-------------| | **RunPod cold start timeout** | No response in 60s | Retry once; if fails, queue for retry in 5 min | "Analysis queued — GPU starting up" | | **Worker execution timeout** | RunPod 300s timeout | Job marked failed; auto-retry with reduced series count | "Analysis timed out — retrying with fewer series" | | **Model inference error** | Exception in handler | Return error in callback; log stack trace | "Analysis failed — try again or contact support" | | **DICOM download failure** | HTTP error / timeout | Retry download 3× with exponential backoff | Transparent to user | | **Callback delivery failure** | HTTP error | Worker retries 3×; inou polls RunPod status API as fallback | Transparent — results arrive via polling | | **Invalid DICOM study** | Parse errors in ingest | Reject at upload time with specific error | "Unable to parse series — unsupported format" | | **No relevant series found** | Series selector returns empty | Skip analysis; inform user | "No brain MRI sequences detected in this study" | | **RunPod quota / billing** | 402/429 errors | Alert admin; queue jobs for later | "Analysis temporarily unavailable" | | **Network volume unavailable** | Model load failure | Worker retries; if persistent, alert | "Analysis service maintenance" | ### Retry Strategy ```go type RetryPolicy struct { MaxAttempts int // 3 InitialBackoff time.Duration // 10s MaxBackoff time.Duration // 5m BackoffFactor float64 // 2.0 RetryableErrors []string // timeout, 5xx, connection_refused } ``` ### Monitoring & Alerting | Metric | Alert Threshold | Channel | |--------|----------------|---------| | Job failure rate | >10% in 1 hour | Slack / Signal | | Median latency | >180s | Dashboard | | RunPod spend | >$50/day | Email | | Cold start frequency | >50% of jobs | Dashboard (indicates insufficient idle timeout) | | Callback failures | Any | Immediate alert | --- ## 10. Implementation Plan ### Phase 1: Foundation (2 weeks) - [ ] Fork Prima repo, create `inou-prima-worker` repo - [ ] Write RunPod handler (`handler.py`) - [ ] Build Docker image, test locally with `nvidia-docker` - [ ] Set up RunPod account, network volume, deploy worker - [ ] Test end-to-end with a sample DICOM study - [ ] Verify model outputs match Prima's reference results ### Phase 2: Series Selection (1 week) - [ ] Implement rule-based series classifier in Go - [ ] Build clinical protocol selection logic - [ ] Add LLM fallback (Claude Haiku integration) - [ ] Test with diverse DICOM studies (Siemens, GE, Philips naming conventions) - [ ] Validate selection accuracy: should match radiologist series choice >90% ### Phase 3: Backend Integration (2 weeks) - [ ] Add analysis endpoints to inou backend - [ ] Implement job queue with retry logic - [ ] Build DICOM download endpoint with signed URLs - [ ] Implement callback handler with HMAC verification - [ ] Add PHI stripping to DICOM export pipeline - [ ] Store results in dossier (encrypted) - [ ] Write integration tests ### Phase 4: Frontend Integration (1 week) - [ ] Add "Analyze with AI" button to viewer - [ ] Build AI results panel component - [ ] Implement progress indicator (WebSocket or polling) - [ ] Add results to dossier view - [ ] Export report functionality ### Phase 5: Hardening (1 week) - [ ] Load testing (concurrent jobs, cold starts) - [ ] Error handling for all failure modes - [ ] Monitoring dashboard - [ ] Cost tracking and alerting - [ ] Security review (PHI flow, access controls) - [ ] Documentation **Total: ~7 weeks** to production-ready MVP. --- ## 11. Future Work ### Near-Term (3–6 months) - **Auto-trigger analysis** on brain MRI upload (configurable) - **Batch processing** — analyze multiple studies in parallel - **Result caching** — skip re-analysis if study hasn't changed - **Comparison mode** — compare current vs prior study results ### Medium-Term (6–12 months) - **Fine-tuning on specific conditions** — Use inou's accumulated (de-identified) data to fine-tune Prima for conditions most relevant to inou's user base (e.g., pediatric hydrocephalus for Sophia's case) - **Radiologist feedback loop** — Allow radiologists to confirm/reject AI findings, feeding back into training data - **Multi-series reasoning** — Instead of per-series inference, develop cross-series reasoning (e.g., comparing pre- and post-contrast T1) ### Long-Term (12+ months) - **Multi-organ expansion** — As VLMs for spine, cardiac, abdominal MRI become available, integrate them using the same serverless architecture - **On-premise deployment** — For high-volume clients or regulatory requirements, offer Prima as an on-prem container (requires dedicated GPU) - **Real-time inference** — As models get faster and GPUs cheaper, offer analysis during the MRI scan itself (PACS integration) - **Clinical decision support** — Integrate Prima results with patient history, labs, and prior imaging for comprehensive clinical recommendations - **FDA 510(k) pathway** — If inou pursues clinical deployment, Prima results would need FDA clearance as a Computer-Aided Detection (CADe) or Computer-Aided Diagnosis (CADx) device ### Architecture Extensibility The serverless worker pattern is model-agnostic. To add a new model: 1. Build a new Docker image with the model 2. Deploy as a separate RunPod serverless endpoint 3. Add a new series selector profile 4. Route from inou backend based on study type / body part ``` inou Backend │ ┌──────────┼──────────┐ ▼ ▼ ▼ Prima SpineVLM CardiacAI (Brain) (Spine) (Heart) L40S L40S A100 ``` Same queue, same callback pattern, same dossier storage. Only the worker and selector change. --- ## Appendix A: Prima Model Details | Property | Value | |----------|-------| | **Paper** | "Learning neuroimaging models from health system-scale data" (arXiv:2509.18638) | | **Training data** | UM-220K: 220,000+ MRI studies, 5.6M 3D sequences, 362M 2D images | | **Architecture** | Hierarchical VLM: VQ-VAE tokenizer → Perceiver → Transformer | | **Diagnoses covered** | 52 radiologic diagnoses across neoplastic, inflammatory, infectious, developmental | | **Mean AUC** | 90.1 ± 5.0% (prospective 30K study validation) | | **License** | MIT | | **GPU requirement** | Ampere+ (flash-attn), 48GB VRAM recommended (L40S, A100) | | **Weights** | `primafullmodel107.pt` (~3.5GB) + `vqvae_model_step16799.pth` (~500MB) | | **Dependencies** | PyTorch 2.6, flash-attn 2.7.4, MONAI 1.5.1, transformers 4.49 | ## Appendix B: DICOM Tags Reference ``` (0008,0008) ImageType — ORIGINAL\PRIMARY vs DERIVED\SECONDARY (0008,0060) Modality — MR (0008,103E) SeriesDescription — Free text, scanner-dependent (0010,0010) PatientName — PHI — STRIP before sending to worker (0010,0020) PatientID — PHI — STRIP (0010,0030) PatientBirthDate — PHI — STRIP (0018,0010) ContrastBolusAgent — Present = contrast-enhanced (0018,0020) ScanningSequence — SE, GR, IR, EP (spin echo, gradient, inversion recovery, echo planar) (0018,0023) MRAcquisitionType — 2D, 3D (0018,0024) SequenceName — Scanner-specific pulse sequence name (0018,0050) SliceThickness — mm (0020,0011) SeriesNumber — Integer ordering (0020,0013) InstanceNumber — Slice index within series (0028,0010) Rows — Pixel rows (0028,0011) Columns — Pixel columns ``` --- *This specification is a living document. Update as implementation progresses and requirements evolve.*