235 lines
6.1 KiB
Markdown
235 lines
6.1 KiB
Markdown
# inou API Design Decisions
|
|
*January 2026*
|
|
|
|
## Overview
|
|
|
|
This document captures the API redesign decisions for inou, focusing on making health data AI-ready through progressive disclosure, efficient queries, and secure token-based access.
|
|
|
|
---
|
|
|
|
## Authentication
|
|
|
|
### Token-Based Access
|
|
```json
|
|
{
|
|
"d": "dossier_id",
|
|
"exp": 1736470800
|
|
}
|
|
```
|
|
|
|
- **d**: The authenticated dossier (who logged in)
|
|
- **exp**: Unix timestamp expiration (few hours for external LLMs like Grok)
|
|
- Token is encrypted using existing `lib.CryptoEncrypt`
|
|
- No raw dossier IDs in URLs that live forever
|
|
- Backend looks up permissions from `dossier_access` table (not in token)
|
|
|
|
### Why Tokens?
|
|
- Grok/ChatGPT users were querying with raw dossier IDs days later
|
|
- Tokens expire, preventing stale access
|
|
- Simpler than passing dossier in every request
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### REST Style with Versioning
|
|
```
|
|
GET /api/v1/dossiers
|
|
GET /api/v1/dossiers/{id}
|
|
GET /api/v1/dossiers/{id}/entries
|
|
GET /api/v1/dossiers/{id}/entries/{entry_id}
|
|
GET /api/v1/dossiers/{id}/entries/{entry_id}?detail=full
|
|
```
|
|
|
|
### Token in Header
|
|
```
|
|
Authorization: Bearer <token>
|
|
```
|
|
|
|
### Query Parameters
|
|
| Param | Purpose |
|
|
|-------|---------|
|
|
| `detail=full` | Return full image/text data |
|
|
| `search=pons` | Search summaries/tags |
|
|
| `category=imaging` | Filter by category (English) |
|
|
| `anatomy=hypothalamus` | Find slices by anatomy |
|
|
| `W=200&L=500` | DICOM windowing for images |
|
|
|
|
### Dossier Always Explicit
|
|
Even if user has only one dossier, it's in the URL. No special cases.
|
|
|
|
---
|
|
|
|
## Progressive Disclosure
|
|
|
|
LLM gets everything needed to **decide** in first call. Full detail only when needed.
|
|
|
|
| Entry Type | Quick Glance | Full Detail |
|
|
|------------|--------------|-------------|
|
|
| Study | anatomy + summary | - |
|
|
| Series | slice count, orientation | - |
|
|
| Slice | thumbnail (150x150) | full image |
|
|
| Document | summary + tags | full text |
|
|
| Lab | values + ranges | historical trend |
|
|
| Genome | category + count | variant list |
|
|
|
|
### Fewer Round-Trips
|
|
Before: LLM guesses slices, fetches one by one, backtracks
|
|
After: LLM sees anatomy index, requests exact slices needed
|
|
|
|
---
|
|
|
|
## Thumbnails
|
|
|
|
### Specification
|
|
- Size: **150x150** max dimension (preserve aspect ratio)
|
|
- Format: **PNG** (8-bit greyscale, lossless)
|
|
- Target: **~5KB** per thumbnail
|
|
- Storage: **Database** (in Data JSON field or BLOB column)
|
|
|
|
### Why DB Not Filesystem?
|
|
- Batch queries: "get 50 slices with thumbnails" = 1 query
|
|
- Fewer IOPS (no 50 small file reads)
|
|
- DB file stays hot in cache
|
|
- 4000 slices x 5KB = 20MB (trivial)
|
|
|
|
### Full Images
|
|
- Stay on filesystem (`/tank/inou/objects/`)
|
|
- Fetched one at a time with `?detail=full`
|
|
|
|
---
|
|
|
|
## Anatomical Analysis
|
|
|
|
### Problem
|
|
LLM spends many round-trips finding the right slice ("find the pons" → guess → wrong → try again)
|
|
|
|
### Solution
|
|
Analyze reference slices at ingest, store anatomy with z-ranges.
|
|
|
|
### Approach
|
|
1. For each orientation (SAG, COR, AX) present in study
|
|
2. Pick mid-slice from T2 (preferred) or T1
|
|
3. Run vision API: "identify anatomical structures with z-ranges"
|
|
4. Store in study entry
|
|
|
|
### Storage
|
|
```json
|
|
{
|
|
"anatomy": {
|
|
"pons": {"z_min": -30, "z_max": -20},
|
|
"cerebellum": {"z_min": -45, "z_max": -25},
|
|
"hypothalamus": {"z_min": 20, "z_max": 26}
|
|
},
|
|
"body_part": "brain"
|
|
}
|
|
```
|
|
|
|
### Query Time
|
|
- LLM asks "find pons in this series"
|
|
- Lookup: pons at z=-30 to -20
|
|
- Find slices in series where `slice_location BETWEEN -30 AND -20`
|
|
- Return matching slices with thumbnails
|
|
|
|
### Cost
|
|
- 1-3 vision API calls per study (~$0.01-0.03)
|
|
- Stored once, used forever
|
|
- No per-query cost
|
|
|
|
### Why Not Position-Based Lookup?
|
|
Tried it, deleted it. Pediatric vs adult brains have different z-coordinates for same structures.
|
|
|
|
### Generalization
|
|
- Brain: mid-sagittal reference
|
|
- Spine: mid-sagittal reference
|
|
- Knee: mid-sagittal reference
|
|
- Abdomen: may need coronal + axial references
|
|
- Animals: same principle, vision model identifies structures
|
|
|
|
---
|
|
|
|
## Categories
|
|
|
|
### Consolidated Structure
|
|
27 categories (down from 31), using integers internally.
|
|
|
|
| Int | Name | Notes |
|
|
|-----|------|-------|
|
|
| 0 | imaging | Unified: slice, series, study (Type differentiates) |
|
|
| 1 | document | |
|
|
| 2 | lab | |
|
|
| 3 | genome | Unified: tier, rsid, variant (Type differentiates) |
|
|
| ... | ... | |
|
|
|
|
### Translation
|
|
- **API Input**: Accept English OR any translated name
|
|
- **API Output**: Return translated name (user's language)
|
|
- **Navigation**: By entry ID, not category name
|
|
|
|
```go
|
|
var categoryFromAny = map[string]int{} // "genome", "Геном", "ゲノム" → 3
|
|
```
|
|
|
|
---
|
|
|
|
## Database
|
|
|
|
### SQLite Stays
|
|
- Read-heavy workload (LLM queries)
|
|
- Occasional writes (imports)
|
|
- Single server, few users
|
|
- Millions of rows, no problem
|
|
- Single file backup
|
|
- WAL mode for concurrency
|
|
|
|
### Alternatives Considered
|
|
- **FoundationDB**: Overkill for single server
|
|
- **bbolt/BadgerDB**: Lose SQL convenience, maintain indexes manually
|
|
- **Postgres**: "Beautiful tech from 15 years ago" - user preference
|
|
|
|
### What We Actually Use
|
|
Just key-value with secondary indexes:
|
|
1. Get by ID
|
|
2. Get children (parent_id = X)
|
|
3. Filter by dossier
|
|
4. Filter by category
|
|
5. Order by ordinal/timestamp
|
|
|
|
SQLite handles this perfectly.
|
|
|
|
---
|
|
|
|
## RBAC (Future)
|
|
|
|
### Concept
|
|
Category-level permissions per relationship.
|
|
|
|
Example: Trainer can see exercise, nutrition, supplements but NOT hospitalization, fertility.
|
|
|
|
### Implementation
|
|
- Presets by relationship type (trainer, doctor, family)
|
|
- User overrides per category
|
|
- Stored in `dossier_access` table
|
|
- Backend lookup (not in token)
|
|
|
|
### Not Priority Now
|
|
Token + expiration first. RBAC layers on later.
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
| Decision | Choice |
|
|
|----------|--------|
|
|
| Auth | Token with dossier + expiration |
|
|
| API Style | REST, versioned (`/api/v1/`) |
|
|
| Endpoints | `/dossiers`, `/entries` (2 main) |
|
|
| Thumbnails | 150x150 PNG, ~5KB, in DB |
|
|
| Full images | Filesystem, on-demand |
|
|
| Anatomy | Reference slice analysis at ingest |
|
|
| Categories | Integers internally, translated strings externally |
|
|
| Database | SQLite |
|
|
| RBAC | Future work |
|
|
|
|
**Core Principle**: AI-ready health data through progressive disclosure. LLM gets context upfront, fetches details only when needed.
|