docsys

Commit Graph

Author	SHA1	Message	Date
James	31d6cb6f86	fix: skip store copy if file already exists (reprocess safety) When reprocessing a document whose PDF is already in the store, copyFile() would fail with 'open /srv/docsys/inbox/...: no such file or directory' because the upload wrote to a temp inbox path that was already cleaned up by the time async OCR completed. The store is keyed by content hash so if the file is already there, the copy is a no-op — skip it rather than error out.	2026-03-23 14:27:38 -04:00
James	883f118d66	fix: pdftoppm output filename glob instead of hardcoded page-1.png pdftoppm zero-pads the page number based on total page count: - <10 pages: page-1.png - <100 pages: page-01.png - <1000 pages: page-001.png The code hardcoded 'page-1.png' and 'page-N.png', which fails for any multi-page document. Use filepath.Glob('page-*.png') to find the actual output regardless of padding width. Fixed in both ConvertToImage() (first-page preview) and the multi-page OCR loop in ProcessDocument().	2026-03-23 14:14:28 -04:00
James	63d4e5e5ca	chore: auto-commit uncommitted changes	2026-02-28 06:01:28 -05:00
James	2c91d5649e	chore: auto-commit uncommitted changes	2026-02-28 00:01:21 -05:00
James	83373885d4	Add vocabulary hints for handwriting: Jongsma, Johan, Tatyana, St. Petersburg FL	2026-02-25 14:24:04 -05:00
James	1b4c82ab83	Improve title prompt: require specific, identifying titles with sender+topic+date	2026-02-25 14:21:43 -05:00
James	4970157690	Switch vision model to qwen3-vl-30b-a3b-instruct Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought reasoning instead of JSON for non-English docs, requiring a fallback path. qwen3-vl works first try, no retry needed, preserves original language correctly.	2026-02-25 14:17:54 -05:00
James	d962c9839d	Fix extraction: don't translate, fallback OCR+classify path for non-JSON responses - Add 'DO NOT translate, preserve original language' to vision prompts - Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity - Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify) - Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text - Switch model back to kimi-k2p5 (only vision model on this Fireworks account) - Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)	2026-02-25 14:01:59 -05:00
James	00d8f7c94a	chore: auto-commit uncommitted changes	2026-02-15 12:00:36 -05:00
James	00d0b0a0d7	Initial commit	2026-02-04 13:37:26 -05:00

10 Commits