docsys

Commit Graph

Author	SHA1	Message	Date
James	883f118d66	fix: pdftoppm output filename glob instead of hardcoded page-1.png pdftoppm zero-pads the page number based on total page count: - <10 pages: page-1.png - <100 pages: page-01.png - <1000 pages: page-001.png The code hardcoded 'page-1.png' and 'page-N.png', which fails for any multi-page document. Use filepath.Glob('page-*.png') to find the actual output regardless of padding width. Fixed in both ConvertToImage() (first-page preview) and the multi-page OCR loop in ProcessDocument().	2026-03-23 14:14:28 -04:00
James	63d4e5e5ca	chore: auto-commit uncommitted changes	2026-02-28 06:01:28 -05:00
James	2c91d5649e	chore: auto-commit uncommitted changes	2026-02-28 00:01:21 -05:00
James	83373885d4	Add vocabulary hints for handwriting: Jongsma, Johan, Tatyana, St. Petersburg FL	2026-02-25 14:24:04 -05:00
James	1b4c82ab83	Improve title prompt: require specific, identifying titles with sender+topic+date	2026-02-25 14:21:43 -05:00
James	4970157690	Switch vision model to qwen3-vl-30b-a3b-instruct Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought reasoning instead of JSON for non-English docs, requiring a fallback path. qwen3-vl works first try, no retry needed, preserves original language correctly.	2026-02-25 14:17:54 -05:00
James	d962c9839d	Fix extraction: don't translate, fallback OCR+classify path for non-JSON responses - Add 'DO NOT translate, preserve original language' to vision prompts - Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity - Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify) - Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text - Switch model back to kimi-k2p5 (only vision model on this Fireworks account) - Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)	2026-02-25 14:01:59 -05:00
James	00d8f7c94a	chore: auto-commit uncommitted changes	2026-02-15 12:00:36 -05:00
James	00d0b0a0d7	Initial commit	2026-02-04 13:37:26 -05:00

9 Commits