inou/docs/api-design-2026-01.md

6.1 KiB

inou API Design Decisions

January 2026

Overview

This document captures the API redesign decisions for inou, focusing on making health data AI-ready through progressive disclosure, efficient queries, and secure token-based access.


Authentication

Token-Based Access

{
  "d": "dossier_id",
  "exp": 1736470800
}
  • d: The authenticated dossier (who logged in)
  • exp: Unix timestamp expiration (few hours for external LLMs like Grok)
  • Token is encrypted using existing lib.CryptoEncrypt
  • No raw dossier IDs in URLs that live forever
  • Backend looks up permissions from dossier_access table (not in token)

Why Tokens?

  • Grok/ChatGPT users were querying with raw dossier IDs days later
  • Tokens expire, preventing stale access
  • Simpler than passing dossier in every request

API Endpoints

REST Style with Versioning

GET /api/v1/dossiers
GET /api/v1/dossiers/{id}
GET /api/v1/dossiers/{id}/entries
GET /api/v1/dossiers/{id}/entries/{entry_id}
GET /api/v1/dossiers/{id}/entries/{entry_id}?detail=full

Token in Header

Authorization: Bearer <token>

Query Parameters

Param Purpose
detail=full Return full image/text data
search=pons Search summaries/tags
category=imaging Filter by category (English)
anatomy=hypothalamus Find slices by anatomy
W=200&L=500 DICOM windowing for images

Dossier Always Explicit

Even if user has only one dossier, it's in the URL. No special cases.


Progressive Disclosure

LLM gets everything needed to decide in first call. Full detail only when needed.

Entry Type Quick Glance Full Detail
Study anatomy + summary -
Series slice count, orientation -
Slice thumbnail (150x150) full image
Document summary + tags full text
Lab values + ranges historical trend
Genome category + count variant list

Fewer Round-Trips

Before: LLM guesses slices, fetches one by one, backtracks After: LLM sees anatomy index, requests exact slices needed


Thumbnails

Specification

  • Size: 150x150 max dimension (preserve aspect ratio)
  • Format: PNG (8-bit greyscale, lossless)
  • Target: ~5KB per thumbnail
  • Storage: Database (in Data JSON field or BLOB column)

Why DB Not Filesystem?

  • Batch queries: "get 50 slices with thumbnails" = 1 query
  • Fewer IOPS (no 50 small file reads)
  • DB file stays hot in cache
  • 4000 slices x 5KB = 20MB (trivial)

Full Images

  • Stay on filesystem (/tank/inou/objects/)
  • Fetched one at a time with ?detail=full

Anatomical Analysis

Problem

LLM spends many round-trips finding the right slice ("find the pons" → guess → wrong → try again)

Solution

Analyze reference slices at ingest, store anatomy with z-ranges.

Approach

  1. For each orientation (SAG, COR, AX) present in study
  2. Pick mid-slice from T2 (preferred) or T1
  3. Run vision API: "identify anatomical structures with z-ranges"
  4. Store in study entry

Storage

{
  "anatomy": {
    "pons": {"z_min": -30, "z_max": -20},
    "cerebellum": {"z_min": -45, "z_max": -25},
    "hypothalamus": {"z_min": 20, "z_max": 26}
  },
  "body_part": "brain"
}

Query Time

  • LLM asks "find pons in this series"
  • Lookup: pons at z=-30 to -20
  • Find slices in series where slice_location BETWEEN -30 AND -20
  • Return matching slices with thumbnails

Cost

  • 1-3 vision API calls per study (~$0.01-0.03)
  • Stored once, used forever
  • No per-query cost

Why Not Position-Based Lookup?

Tried it, deleted it. Pediatric vs adult brains have different z-coordinates for same structures.

Generalization

  • Brain: mid-sagittal reference
  • Spine: mid-sagittal reference
  • Knee: mid-sagittal reference
  • Abdomen: may need coronal + axial references
  • Animals: same principle, vision model identifies structures

Categories

Consolidated Structure

27 categories (down from 31), using integers internally.

Int Name Notes
0 imaging Unified: slice, series, study (Type differentiates)
1 document
2 lab
3 genome Unified: tier, rsid, variant (Type differentiates)
... ...

Translation

  • API Input: Accept English OR any translated name
  • API Output: Return translated name (user's language)
  • Navigation: By entry ID, not category name
var categoryFromAny = map[string]int{}  // "genome", "Геном", "ゲノム" → 3

Database

SQLite Stays

  • Read-heavy workload (LLM queries)
  • Occasional writes (imports)
  • Single server, few users
  • Millions of rows, no problem
  • Single file backup
  • WAL mode for concurrency

Alternatives Considered

  • FoundationDB: Overkill for single server
  • bbolt/BadgerDB: Lose SQL convenience, maintain indexes manually
  • Postgres: "Beautiful tech from 15 years ago" - user preference

What We Actually Use

Just key-value with secondary indexes:

  1. Get by ID
  2. Get children (parent_id = X)
  3. Filter by dossier
  4. Filter by category
  5. Order by ordinal/timestamp

SQLite handles this perfectly.


RBAC (Future)

Concept

Category-level permissions per relationship.

Example: Trainer can see exercise, nutrition, supplements but NOT hospitalization, fertility.

Implementation

  • Presets by relationship type (trainer, doctor, family)
  • User overrides per category
  • Stored in dossier_access table
  • Backend lookup (not in token)

Not Priority Now

Token + expiration first. RBAC layers on later.


Summary

Decision Choice
Auth Token with dossier + expiration
API Style REST, versioned (/api/v1/)
Endpoints /dossiers, /entries (2 main)
Thumbnails 150x150 PNG, ~5KB, in DB
Full images Filesystem, on-demand
Anatomy Reference slice analysis at ingest
Categories Integers internally, translated strings externally
Database SQLite
RBAC Future work

Core Principle: AI-ready health data through progressive disclosure. LLM gets context upfront, fetches details only when needed.