inou/docs/api-design-2026-01.md

235 lines
6.1 KiB
Markdown

# inou API Design Decisions
*January 2026*
## Overview
This document captures the API redesign decisions for inou, focusing on making health data AI-ready through progressive disclosure, efficient queries, and secure token-based access.
---
## Authentication
### Token-Based Access
```json
{
"d": "dossier_id",
"exp": 1736470800
}
```
- **d**: The authenticated dossier (who logged in)
- **exp**: Unix timestamp expiration (few hours for external LLMs like Grok)
- Token is encrypted using existing `lib.CryptoEncrypt`
- No raw dossier IDs in URLs that live forever
- Backend looks up permissions from `dossier_access` table (not in token)
### Why Tokens?
- Grok/ChatGPT users were querying with raw dossier IDs days later
- Tokens expire, preventing stale access
- Simpler than passing dossier in every request
---
## API Endpoints
### REST Style with Versioning
```
GET /api/v1/dossiers
GET /api/v1/dossiers/{id}
GET /api/v1/dossiers/{id}/entries
GET /api/v1/dossiers/{id}/entries/{entry_id}
GET /api/v1/dossiers/{id}/entries/{entry_id}?detail=full
```
### Token in Header
```
Authorization: Bearer <token>
```
### Query Parameters
| Param | Purpose |
|-------|---------|
| `detail=full` | Return full image/text data |
| `search=pons` | Search summaries/tags |
| `category=imaging` | Filter by category (English) |
| `anatomy=hypothalamus` | Find slices by anatomy |
| `W=200&L=500` | DICOM windowing for images |
### Dossier Always Explicit
Even if user has only one dossier, it's in the URL. No special cases.
---
## Progressive Disclosure
LLM gets everything needed to **decide** in first call. Full detail only when needed.
| Entry Type | Quick Glance | Full Detail |
|------------|--------------|-------------|
| Study | anatomy + summary | - |
| Series | slice count, orientation | - |
| Slice | thumbnail (150x150) | full image |
| Document | summary + tags | full text |
| Lab | values + ranges | historical trend |
| Genome | category + count | variant list |
### Fewer Round-Trips
Before: LLM guesses slices, fetches one by one, backtracks
After: LLM sees anatomy index, requests exact slices needed
---
## Thumbnails
### Specification
- Size: **150x150** max dimension (preserve aspect ratio)
- Format: **PNG** (8-bit greyscale, lossless)
- Target: **~5KB** per thumbnail
- Storage: **Database** (in Data JSON field or BLOB column)
### Why DB Not Filesystem?
- Batch queries: "get 50 slices with thumbnails" = 1 query
- Fewer IOPS (no 50 small file reads)
- DB file stays hot in cache
- 4000 slices x 5KB = 20MB (trivial)
### Full Images
- Stay on filesystem (`/tank/inou/objects/`)
- Fetched one at a time with `?detail=full`
---
## Anatomical Analysis
### Problem
LLM spends many round-trips finding the right slice ("find the pons" → guess → wrong → try again)
### Solution
Analyze reference slices at ingest, store anatomy with z-ranges.
### Approach
1. For each orientation (SAG, COR, AX) present in study
2. Pick mid-slice from T2 (preferred) or T1
3. Run vision API: "identify anatomical structures with z-ranges"
4. Store in study entry
### Storage
```json
{
"anatomy": {
"pons": {"z_min": -30, "z_max": -20},
"cerebellum": {"z_min": -45, "z_max": -25},
"hypothalamus": {"z_min": 20, "z_max": 26}
},
"body_part": "brain"
}
```
### Query Time
- LLM asks "find pons in this series"
- Lookup: pons at z=-30 to -20
- Find slices in series where `slice_location BETWEEN -30 AND -20`
- Return matching slices with thumbnails
### Cost
- 1-3 vision API calls per study (~$0.01-0.03)
- Stored once, used forever
- No per-query cost
### Why Not Position-Based Lookup?
Tried it, deleted it. Pediatric vs adult brains have different z-coordinates for same structures.
### Generalization
- Brain: mid-sagittal reference
- Spine: mid-sagittal reference
- Knee: mid-sagittal reference
- Abdomen: may need coronal + axial references
- Animals: same principle, vision model identifies structures
---
## Categories
### Consolidated Structure
27 categories (down from 31), using integers internally.
| Int | Name | Notes |
|-----|------|-------|
| 0 | imaging | Unified: slice, series, study (Type differentiates) |
| 1 | document | |
| 2 | lab | |
| 3 | genome | Unified: tier, rsid, variant (Type differentiates) |
| ... | ... | |
### Translation
- **API Input**: Accept English OR any translated name
- **API Output**: Return translated name (user's language)
- **Navigation**: By entry ID, not category name
```go
var categoryFromAny = map[string]int{} // "genome", "Геном", "ゲノム" → 3
```
---
## Database
### SQLite Stays
- Read-heavy workload (LLM queries)
- Occasional writes (imports)
- Single server, few users
- Millions of rows, no problem
- Single file backup
- WAL mode for concurrency
### Alternatives Considered
- **FoundationDB**: Overkill for single server
- **bbolt/BadgerDB**: Lose SQL convenience, maintain indexes manually
- **Postgres**: "Beautiful tech from 15 years ago" - user preference
### What We Actually Use
Just key-value with secondary indexes:
1. Get by ID
2. Get children (parent_id = X)
3. Filter by dossier
4. Filter by category
5. Order by ordinal/timestamp
SQLite handles this perfectly.
---
## RBAC (Future)
### Concept
Category-level permissions per relationship.
Example: Trainer can see exercise, nutrition, supplements but NOT hospitalization, fertility.
### Implementation
- Presets by relationship type (trainer, doctor, family)
- User overrides per category
- Stored in `dossier_access` table
- Backend lookup (not in token)
### Not Priority Now
Token + expiration first. RBAC layers on later.
---
## Summary
| Decision | Choice |
|----------|--------|
| Auth | Token with dossier + expiration |
| API Style | REST, versioned (`/api/v1/`) |
| Endpoints | `/dossiers`, `/entries` (2 main) |
| Thumbnails | 150x150 PNG, ~5KB, in DB |
| Full images | Filesystem, on-demand |
| Anatomy | Reference slice analysis at ingest |
| Categories | Integers internally, translated strings externally |
| Database | SQLite |
| RBAC | Future work |
**Core Principle**: AI-ready health data through progressive disclosure. LLM gets context upfront, fetches details only when needed.