6.1 KiB
inou API Design Decisions
January 2026
Overview
This document captures the API redesign decisions for inou, focusing on making health data AI-ready through progressive disclosure, efficient queries, and secure token-based access.
Authentication
Token-Based Access
{
"d": "dossier_id",
"exp": 1736470800
}
- d: The authenticated dossier (who logged in)
- exp: Unix timestamp expiration (few hours for external LLMs like Grok)
- Token is encrypted using existing
lib.CryptoEncrypt - No raw dossier IDs in URLs that live forever
- Backend looks up permissions from
dossier_accesstable (not in token)
Why Tokens?
- Grok/ChatGPT users were querying with raw dossier IDs days later
- Tokens expire, preventing stale access
- Simpler than passing dossier in every request
API Endpoints
REST Style with Versioning
GET /api/v1/dossiers
GET /api/v1/dossiers/{id}
GET /api/v1/dossiers/{id}/entries
GET /api/v1/dossiers/{id}/entries/{entry_id}
GET /api/v1/dossiers/{id}/entries/{entry_id}?detail=full
Token in Header
Authorization: Bearer <token>
Query Parameters
| Param | Purpose |
|---|---|
detail=full |
Return full image/text data |
search=pons |
Search summaries/tags |
category=imaging |
Filter by category (English) |
anatomy=hypothalamus |
Find slices by anatomy |
W=200&L=500 |
DICOM windowing for images |
Dossier Always Explicit
Even if user has only one dossier, it's in the URL. No special cases.
Progressive Disclosure
LLM gets everything needed to decide in first call. Full detail only when needed.
| Entry Type | Quick Glance | Full Detail |
|---|---|---|
| Study | anatomy + summary | - |
| Series | slice count, orientation | - |
| Slice | thumbnail (150x150) | full image |
| Document | summary + tags | full text |
| Lab | values + ranges | historical trend |
| Genome | category + count | variant list |
Fewer Round-Trips
Before: LLM guesses slices, fetches one by one, backtracks After: LLM sees anatomy index, requests exact slices needed
Thumbnails
Specification
- Size: 150x150 max dimension (preserve aspect ratio)
- Format: PNG (8-bit greyscale, lossless)
- Target: ~5KB per thumbnail
- Storage: Database (in Data JSON field or BLOB column)
Why DB Not Filesystem?
- Batch queries: "get 50 slices with thumbnails" = 1 query
- Fewer IOPS (no 50 small file reads)
- DB file stays hot in cache
- 4000 slices x 5KB = 20MB (trivial)
Full Images
- Stay on filesystem (
/tank/inou/objects/) - Fetched one at a time with
?detail=full
Anatomical Analysis
Problem
LLM spends many round-trips finding the right slice ("find the pons" → guess → wrong → try again)
Solution
Analyze reference slices at ingest, store anatomy with z-ranges.
Approach
- For each orientation (SAG, COR, AX) present in study
- Pick mid-slice from T2 (preferred) or T1
- Run vision API: "identify anatomical structures with z-ranges"
- Store in study entry
Storage
{
"anatomy": {
"pons": {"z_min": -30, "z_max": -20},
"cerebellum": {"z_min": -45, "z_max": -25},
"hypothalamus": {"z_min": 20, "z_max": 26}
},
"body_part": "brain"
}
Query Time
- LLM asks "find pons in this series"
- Lookup: pons at z=-30 to -20
- Find slices in series where
slice_location BETWEEN -30 AND -20 - Return matching slices with thumbnails
Cost
- 1-3 vision API calls per study (~$0.01-0.03)
- Stored once, used forever
- No per-query cost
Why Not Position-Based Lookup?
Tried it, deleted it. Pediatric vs adult brains have different z-coordinates for same structures.
Generalization
- Brain: mid-sagittal reference
- Spine: mid-sagittal reference
- Knee: mid-sagittal reference
- Abdomen: may need coronal + axial references
- Animals: same principle, vision model identifies structures
Categories
Consolidated Structure
27 categories (down from 31), using integers internally.
| Int | Name | Notes |
|---|---|---|
| 0 | imaging | Unified: slice, series, study (Type differentiates) |
| 1 | document | |
| 2 | lab | |
| 3 | genome | Unified: tier, rsid, variant (Type differentiates) |
| ... | ... |
Translation
- API Input: Accept English OR any translated name
- API Output: Return translated name (user's language)
- Navigation: By entry ID, not category name
var categoryFromAny = map[string]int{} // "genome", "Геном", "ゲノム" → 3
Database
SQLite Stays
- Read-heavy workload (LLM queries)
- Occasional writes (imports)
- Single server, few users
- Millions of rows, no problem
- Single file backup
- WAL mode for concurrency
Alternatives Considered
- FoundationDB: Overkill for single server
- bbolt/BadgerDB: Lose SQL convenience, maintain indexes manually
- Postgres: "Beautiful tech from 15 years ago" - user preference
What We Actually Use
Just key-value with secondary indexes:
- Get by ID
- Get children (parent_id = X)
- Filter by dossier
- Filter by category
- Order by ordinal/timestamp
SQLite handles this perfectly.
RBAC (Future)
Concept
Category-level permissions per relationship.
Example: Trainer can see exercise, nutrition, supplements but NOT hospitalization, fertility.
Implementation
- Presets by relationship type (trainer, doctor, family)
- User overrides per category
- Stored in
dossier_accesstable - Backend lookup (not in token)
Not Priority Now
Token + expiration first. RBAC layers on later.
Summary
| Decision | Choice |
|---|---|
| Auth | Token with dossier + expiration |
| API Style | REST, versioned (/api/v1/) |
| Endpoints | /dossiers, /entries (2 main) |
| Thumbnails | 150x150 PNG, ~5KB, in DB |
| Full images | Filesystem, on-demand |
| Anatomy | Reference slice analysis at ingest |
| Categories | Integers internally, translated strings externally |
| Database | SQLite |
| RBAC | Future work |
Core Principle: AI-ready health data through progressive disclosure. LLM gets context upfront, fetches details only when needed.