inou/CLAUDE.md

322 lines
14 KiB
Markdown

# inou
Medical imaging platform with AI-powered health data exploration via MCP integration.
## Ground Rules
**Read these first. They override everything else.**
Johan is a CTO with 40 years of engineering experience. Background in backup/storage infrastructure — RBAC, encryption, compression, deduplication. His instincts are: minimize work, minimize storage, minimize bandwidth, zero tolerance for defects. Now building medical software, where correctness matters even more. He's building inou with deliberate architectural discipline — not a startup rushing to ship. Treat him as the architect. You are the collaborator, not the decision-maker. He wants discussion, not implementation surprises. Match his pace, not yours.
1. **Discussion first.** When Johan raises a topic, the default is conversation. No code changes until explicitly asked ("do it", "go ahead", "implement it").
2. **Minimal diffs.** Change only what's requested. No neighboring cleanups, no "while we're here" improvements, no drive-by refactors.
3. **Less code, better architecture.** Don't solve problems by adding code. Flag opportunities to consolidate and reduce. If something needs a lot of code, the architecture is probably wrong — discuss that first. Fewer functions doing more, not more functions doing less.
4. **Ask, don't assume.** If a request is ambiguous, ask a question. Don't pick an interpretation and run with it.
5. **No unsolicited files.** No new files (docs, tests, helpers, scripts) unless explicitly asked.
6. **Respect my decisions.** Don't argue against design choices. If there's a genuine technical concern, mention it once, briefly — then execute. Assume Johan has a reason.
7. **Never revert agreed work.** When something breaks after an agreed change, fix the bug. Don't undo the architectural decision. The decision was deliberate; the bug is incidental.
8. **Match effort to task size.** A simple lookup is a simple lookup. If something simple is taking long, stop and rethink. Don't fumble through trial-and-error iterations.
9. **Fix the tooling first.** When testing or workflow is painful and will be needed repeatedly, flag it and offer to fix that before continuing with the actual task.
10. **Capture architecture decisions.** After an architectural discussion reaches conclusions, offer to write it up in `docs/` so the next session has context. This prevents Johan from re-explaining everything from scratch.
## Starting a Session
Paste this at the start of a session or after compaction:
```
Read the Ground Rules in CLAUDE.md.
We're working on [topic]. Read docs/[relevant-doc].md for context.
Discuss before coding. Ask if unclear.
```
## Independent Analysis Protocol
**inou exists to give LLMs raw health data for INDEPENDENT analysis — not to parrot doctors' conclusions.** This is the core product principle. The MCP server enforces a two-tier data access model: raw data (imaging, labs, genome, vitals) is always available; medical opinions (diagnoses, assessments, consultation notes) are gated behind prior raw data review. This is a hard access control, not a suggestion — equivalent to RBAC. Full spec in `docs/entry-layout.md` under "Independent Analysis Protocol". **Do not bypass this.**
## Design Documentation
- **Entry Layout (Feb 2026)**: `docs/entry-layout.md`**AUTHORITATIVE** field definitions, hierarchy depths, examples for ALL 28 categories, and the Independent Analysis Protocol. Read this before touching entries, imports, or MCP tools. Update it if you change the schema.
- **API Design (Jan 2026)**: `docs/api-design-2026-01.md` — token auth, REST endpoints, progressive disclosure, thumbnails, anatomy analysis
## Architecture
```
┌─────────────┐ ┌─────────────┐
│ Portal │ │ API │
│ (Go/HTML) │ │ (Go) │
│ /mcp HTTP │ │ │
└──────┬──────┘ └──────┬──────┘
│ │
└───────────┬───────┘
┌──────┴──────┐
│ lib │
│ (shared Go) │
└──────┬──────┘
┌──────┴──────┐
│ SQLite │
│ (encrypted)│
└─────────────┘
```
## Data Access Architecture
Three choke points in `lib/dbcore.go` — ALL entry data goes through these:
- `EntryRead(accessorID, dossierID, *Filter)` — all reads
- `EntryWrite(accessorID, entries...)` — all writes (insert or update)
- `EntryDelete(accessorID, dossierID, *Filter)` — all deletes (cascading children + object file cleanup)
**RBAC**: Every choke point calls `CheckAccess()` before touching data.
- `accessorID=""` = system/internal operation (always granted)
- Self-access (accessorID == dossierID) = always granted
- Otherwise requires an access grant in the access table
**No stubs, no wrappers for writes/deletes.** Callers use `EntryWrite`/`EntryDelete` directly.
Read-path convenience wrappers (`EntryGet`, `EntryQuery`, `EntryList`) are acceptable — they delegate to `EntryRead` with RBAC.
**Pack/Unpack** is the ONLY encryption/compression path. All string fields packed on write, unpacked on read. IDs and integers are never packed.
**ObjectWrite/ObjectRead** enforce RBAC and use Pack/Unpack for file I/O.
## Code Locations
| Component | Path | Purpose |
|-----------|------|---------|
| lib | `~/dev/inou/lib/` | Shared: crypto, db, types, translations |
| Portal | `~/dev/inou/portal/` | Web UI, DICOM import, genome processing, MCP server |
| API | `~/dev/inou/api/` | REST API for LLM access |
| Viewer | `~/dev/inou/viewer/` | DICOM viewer (Electron) |
| MCP Client | `~/dev/inou/mcp-client/` | **DEPRECATED** (Claude calls /mcp directly) |
| Doc Processor | `~/dev/inou/doc-processor/` | PDF extraction tool |
| Docs | `~/dev/inou/docs/` | Design documentation |
| Deployed | `/tank/inou/` | Runtime on both staging & prod |
## Environments
| Environment | Server | URL | Purpose |
|-------------|--------|-----|---------|
| **Production** | 192.168.100.2 | https://inou.com | Live users |
| **Staging** | 192.168.1.253 | https://dev.inou.com | Development/testing |
Production is on an isolated VLAN (192.168.100.0/24, VLAN 10) with firewall rules restricting access.
## Server Access
```bash
ssh johan@192.168.1.253 # Staging
ssh johan@192.168.100.2 # Production
```
## Database
**SQLite** with encrypted fields. Single file at `/tank/inou/data/inou.db`.
### ⛔ CRITICAL: Database Access Rules
**ALL database operations MUST go through `lib/db_queries.go` functions. NO EXCEPTIONS.**
Allowed functions (defined in db_queries.go):
- `Save(table, struct)` — INSERT/UPDATE
- `Load(table, id, struct)` — SELECT by primary key
- `Query(sql, args, &slice)` — SELECT with custom queries
- `Delete(table, pkCol, id)` — DELETE by primary key
- `Count(sql, args...)` — SELECT COUNT(*)
**FORBIDDEN (requires Johan's express consent):**
- Any use of `db.Exec()`, `db.Query()`, `db.QueryRow()` outside db_queries.go
- Any use of `lib.DB()` anywhere
- Any modifications to `lib/db_queries.go`
- Any new database wrapper functions
- Any direct SQL execution
**Before making ANY database-related changes:**
1. ASK Johan for permission
2. Explain what you need and why
3. Wait for explicit approval
**Verification:** Run `make check-db` to verify no direct DB access exists.
This check runs automatically on deploy.
Key tables (there is NO `dossiers` table — dossier profiles are entries with Category=0):
- `entries` — all data: dossier profiles (Cat=0), imaging, labs, genome, uploads, etc. Columns: EntryID, DossierID, ParentID, Category, Type, Value, Summary, Ordinal, Timestamp, TimestampEnd, Status, Tags, Data, SearchKey
- `access` — RBAC grants. Columns: AccessID, DossierID, GranteeID, EntryID, Relation, Ops, CreatedAt
- `audit` — activity log. Columns: AuditID, Actor1ID, Actor2ID, TargetID, Action, Details, RelationID, Timestamp
- `lab_test` / `lab_reference` — reference data, snake_case columns
### Querying Encrypted Data
**Encryption is DETERMINISTIC.** You CAN use encrypted columns in WHERE clauses by encrypting the search value: `WHERE col = lib.CryptoEncrypt("value")`. Do NOT pull large datasets and filter client-side when a precise query is possible.
Column names in `entries`/`access`/`audit` are **PascalCase** (EntryID, DossierID, Category, etc.). ID columns are never encrypted.
**IMPORTANT:** Do NOT write Python/Go scripts to decrypt database fields. Use the `decrypt` tool:
```bash
# Raw mode - decrypts inline
sqlite3 /tank/inou/data/inou.db "SELECT * FROM entries LIMIT 5" | /tank/inou/bin/decrypt
# JSON mode - pretty output
sqlite3 -json /tank/inou/data/inou.db "SELECT * FROM entries LIMIT 5" | /tank/inou/bin/decrypt -json
# Combine with grep
sqlite3 /tank/inou/data/inou.db "SELECT * FROM dossiers" | /tank/inou/bin/decrypt | grep -i sophia
```
The tool automatically detects and decrypts base64-encoded encrypted fields while leaving IDs and other plain fields unchanged. Rebuild with `make decrypt` if needed.
## Categories & Entry Layout
28 categories (0-27), integer enums in `lib/types.go`. Full field definitions, hierarchy depths, and examples for every category: **`docs/entry-layout.md`**. That document is authoritative — read it before modifying entries or imports, update it when the schema changes.
Translations in `portal/lang/*.yaml` (15 languages). Use `bin/toolkit translate` to add/modify keys across all languages.
## API Endpoints (Current)
> Note: Being redesigned. See `docs/api-design-2026-01.md` for new design.
Current endpoints:
- `GET /api/dossiers` — list accessible dossiers
- `GET /api/studies?dossier=X` — list imaging studies
- `GET /api/series?dossier=X&study=Y` — list series
- `GET /api/slices?dossier=X&series=Y` — list slices
- `GET /image/{id}?token=X` — fetch image
- `GET /api/genome?dossier=X` — query variants
- `GET /api/entries` — generic entry CRUD
## Deploy
### Staging (default)
```bash
make deploy
```
Deploys to **staging** (192.168.1.253 / dev.inou.com). Use for development and testing.
### Release to Production
```bash
make deploy-prod
```
Deploys to **production** (192.168.100.2 / inou.com). Use when ready to release.
**"Release to prod"** = `make deploy-prod`
### What Deploy Does
Both commands:
1. Run `check-db` (blocks if violations found)
2. Build all binaries (FIPS 140-3 compliant)
3. Stop services on target
4. Copy binaries, templates, static, lang files
5. Start services
6. Show status
### Typical Workflow
```bash
# 1. Develop and test
make deploy # Deploy to staging
# Test at https://dev.inou.com
# 2. When ready for production
make deploy-prod # Release to prod
# Live at https://inou.com
```
## Testing
```bash
make test # Run integration tests (services must be running)
make check-db # Verify no direct DB access (runs automatically on deploy)
```
### Integration Tests (`make test`)
Runs 18 tests covering the full stack:
| Category | Tests |
|----------|-------|
| Service Health | Portal HTTPS, API HTTPS, internal ports |
| Login Flow | Login page, send verification code |
| Authenticated Routes | Dashboard, add/view/edit dossier |
| Dossier CRUD | Create dossier, verify via API, cleanup |
| API Data Access | List dossiers/studies/series/slices |
| Image Fetch | Fetch DICOM slice |
### DB Access Check (`make check-db`)
Scans core codebase for forbidden direct database access:
- `lib.DB()` usage
- `db.Exec/Query/QueryRow` outside db_queries.go/db_schema.go
- Deprecated wrappers (DBExec, DBInsert, etc.)
Runs automatically before every deploy. Blocks deployment if violations found.
### MCP Server
**Endpoint:** `https://inou.com/mcp` (production) or `https://dev.inou.com/mcp` (staging)
**No bridge required** - Claude Desktop/web calls the portal's `/mcp` endpoint directly via OAuth 2.0.
**Implementation:** `portal/mcp_http.go`, `portal/mcp_tools.go`
**Documentation:** `docs/mcp-server-setup.md`
## Test Dossiers
- **Sophia** — brain MRI, spine MRI, CT, X-rays (May 2022)
- **Johan** — genome data, labs
- **Alena** — genome data, labs
## URLs
**Production:**
- Portal: https://inou.com
- API: https://inou.com/api/*
- API Docs: https://inou.com/api/docs
**Staging:**
- Portal: https://dev.inou.com
- API: https://dev.inou.com/api/*
## Key Design Decisions
1. **No accounts table** — everyone is a dossier (for virality)
2. **Encrypted fields** — all PII encrypted at rest via `lib.CryptoEncrypt`
3. **Categories as integers** — scalable, translated at API boundary
4. **Type field** — differentiates within category (slice/series/study all CategoryImaging)
5. **parent_id** — hierarchy navigation, not category
6. **SQLite** — simple, single file, handles the scale
## File Editing
**Native (on Linux server):** Use standard tools — `sed`, Edit tool, etc. all work fine.
**Remote (from Mac via SSH):** Use `claude-edit` to avoid quote escaping issues:
```bash
cat /tmp/edit.json | ssh johan@192.168.1.253 "~/bin/claude-edit"
```
## Extraction Prompt Development
Prompts: `api/tracker_prompts/extract_*.md`, deployed to `/tank/inou/tracker_prompts/` (rsync, no restart needed).
LLM: Fireworks `accounts/fireworks/models/qwen3-vl-30b-a3b-instruct`, temp 0.1, max_tokens 4096.
Key: `FIREWORKS_API_KEY` in `/tank/inou/anthropic.env`.
**Do NOT use the upload-process-check cycle to iterate on prompts.** Curl the LLM directly:
1. Get source markdown from a document entry (dbquery the Data field)
2. Build prompt: `extractionPreamble()` (see `upload.go:790`) + template with `{{MARKDOWN}}` replaced
3. Curl Fireworks, inspect JSON, iterate until correct
4. Test neighboring prompts for false positives (e.g. symptom/nutrition/assessment/note should return `null` for a lab document)
5. Rsync to staging, do one real upload to verify end-to-end
## Known Limitations
- Large X-rays (2836x2336+) fail via MCP fetch
- See `~/dev/inou/TODO.md` for roadmap