# inou API Design Decisions *January 2026* ## Overview This document captures the API redesign decisions for inou, focusing on making health data AI-ready through progressive disclosure, efficient queries, and secure token-based access. --- ## Authentication ### Token-Based Access ```json { "d": "dossier_id", "exp": 1736470800 } ``` - **d**: The authenticated dossier (who logged in) - **exp**: Unix timestamp expiration (few hours for external LLMs like Grok) - Token is encrypted using existing `lib.CryptoEncrypt` - No raw dossier IDs in URLs that live forever - Backend looks up permissions from `dossier_access` table (not in token) ### Why Tokens? - Grok/ChatGPT users were querying with raw dossier IDs days later - Tokens expire, preventing stale access - Simpler than passing dossier in every request --- ## API Endpoints ### REST Style with Versioning ``` GET /api/v1/dossiers GET /api/v1/dossiers/{id} GET /api/v1/dossiers/{id}/entries GET /api/v1/dossiers/{id}/entries/{entry_id} GET /api/v1/dossiers/{id}/entries/{entry_id}?detail=full ``` ### Token in Header ``` Authorization: Bearer ``` ### Query Parameters | Param | Purpose | |-------|---------| | `detail=full` | Return full image/text data | | `search=pons` | Search summaries/tags | | `category=imaging` | Filter by category (English) | | `anatomy=hypothalamus` | Find slices by anatomy | | `W=200&L=500` | DICOM windowing for images | ### Dossier Always Explicit Even if user has only one dossier, it's in the URL. No special cases. --- ## Progressive Disclosure LLM gets everything needed to **decide** in first call. Full detail only when needed. | Entry Type | Quick Glance | Full Detail | |------------|--------------|-------------| | Study | anatomy + summary | - | | Series | slice count, orientation | - | | Slice | thumbnail (150x150) | full image | | Document | summary + tags | full text | | Lab | values + ranges | historical trend | | Genome | category + count | variant list | ### Fewer Round-Trips Before: LLM guesses slices, fetches one by one, backtracks After: LLM sees anatomy index, requests exact slices needed --- ## Thumbnails ### Specification - Size: **150x150** max dimension (preserve aspect ratio) - Format: **PNG** (8-bit greyscale, lossless) - Target: **~5KB** per thumbnail - Storage: **Database** (in Data JSON field or BLOB column) ### Why DB Not Filesystem? - Batch queries: "get 50 slices with thumbnails" = 1 query - Fewer IOPS (no 50 small file reads) - DB file stays hot in cache - 4000 slices x 5KB = 20MB (trivial) ### Full Images - Stay on filesystem (`/tank/inou/objects/`) - Fetched one at a time with `?detail=full` --- ## Anatomical Analysis ### Problem LLM spends many round-trips finding the right slice ("find the pons" → guess → wrong → try again) ### Solution Analyze reference slices at ingest, store anatomy with z-ranges. ### Approach 1. For each orientation (SAG, COR, AX) present in study 2. Pick mid-slice from T2 (preferred) or T1 3. Run vision API: "identify anatomical structures with z-ranges" 4. Store in study entry ### Storage ```json { "anatomy": { "pons": {"z_min": -30, "z_max": -20}, "cerebellum": {"z_min": -45, "z_max": -25}, "hypothalamus": {"z_min": 20, "z_max": 26} }, "body_part": "brain" } ``` ### Query Time - LLM asks "find pons in this series" - Lookup: pons at z=-30 to -20 - Find slices in series where `slice_location BETWEEN -30 AND -20` - Return matching slices with thumbnails ### Cost - 1-3 vision API calls per study (~$0.01-0.03) - Stored once, used forever - No per-query cost ### Why Not Position-Based Lookup? Tried it, deleted it. Pediatric vs adult brains have different z-coordinates for same structures. ### Generalization - Brain: mid-sagittal reference - Spine: mid-sagittal reference - Knee: mid-sagittal reference - Abdomen: may need coronal + axial references - Animals: same principle, vision model identifies structures --- ## Categories ### Consolidated Structure 27 categories (down from 31), using integers internally. | Int | Name | Notes | |-----|------|-------| | 0 | imaging | Unified: slice, series, study (Type differentiates) | | 1 | document | | | 2 | lab | | | 3 | genome | Unified: tier, rsid, variant (Type differentiates) | | ... | ... | | ### Translation - **API Input**: Accept English OR any translated name - **API Output**: Return translated name (user's language) - **Navigation**: By entry ID, not category name ```go var categoryFromAny = map[string]int{} // "genome", "Геном", "ゲノム" → 3 ``` --- ## Database ### SQLite Stays - Read-heavy workload (LLM queries) - Occasional writes (imports) - Single server, few users - Millions of rows, no problem - Single file backup - WAL mode for concurrency ### Alternatives Considered - **FoundationDB**: Overkill for single server - **bbolt/BadgerDB**: Lose SQL convenience, maintain indexes manually - **Postgres**: "Beautiful tech from 15 years ago" - user preference ### What We Actually Use Just key-value with secondary indexes: 1. Get by ID 2. Get children (parent_id = X) 3. Filter by dossier 4. Filter by category 5. Order by ordinal/timestamp SQLite handles this perfectly. --- ## RBAC (Future) ### Concept Category-level permissions per relationship. Example: Trainer can see exercise, nutrition, supplements but NOT hospitalization, fertility. ### Implementation - Presets by relationship type (trainer, doctor, family) - User overrides per category - Stored in `dossier_access` table - Backend lookup (not in token) ### Not Priority Now Token + expiration first. RBAC layers on later. --- ## Summary | Decision | Choice | |----------|--------| | Auth | Token with dossier + expiration | | API Style | REST, versioned (`/api/v1/`) | | Endpoints | `/dossiers`, `/entries` (2 main) | | Thumbnails | 150x150 PNG, ~5KB, in DB | | Full images | Filesystem, on-demand | | Anatomy | Reference slice analysis at ingest | | Categories | Integers internally, translated strings externally | | Database | SQLite | | RBAC | Future work | **Core Principle**: AI-ready health data through progressive disclosure. LLM gets context upfront, fetches details only when needed.