Commit Graph

9 Commits

Author SHA1 Message Date
James 883f118d66 fix: pdftoppm output filename glob instead of hardcoded page-1.png
pdftoppm zero-pads the page number based on total page count:
- <10 pages: page-1.png
- <100 pages: page-01.png
- <1000 pages: page-001.png

The code hardcoded 'page-1.png' and 'page-N.png', which fails for any
multi-page document. Use filepath.Glob('page-*.png') to find the actual
output regardless of padding width.

Fixed in both ConvertToImage() (first-page preview) and the multi-page
OCR loop in ProcessDocument().
2026-03-23 14:14:28 -04:00
James 63d4e5e5ca chore: auto-commit uncommitted changes 2026-02-28 06:01:28 -05:00
James 2c91d5649e chore: auto-commit uncommitted changes 2026-02-28 00:01:21 -05:00
James 83373885d4 Add vocabulary hints for handwriting: Jongsma, Johan, Tatyana, St. Petersburg FL 2026-02-25 14:24:04 -05:00
James 1b4c82ab83 Improve title prompt: require specific, identifying titles with sender+topic+date 2026-02-25 14:21:43 -05:00
James 4970157690 Switch vision model to qwen3-vl-30b-a3b-instruct
Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought
reasoning instead of JSON for non-English docs, requiring a fallback path.
qwen3-vl works first try, no retry needed, preserves original language correctly.
2026-02-25 14:17:54 -05:00
James d962c9839d Fix extraction: don't translate, fallback OCR+classify path for non-JSON responses
- Add 'DO NOT translate, preserve original language' to vision prompts
- Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity
- Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify)
- Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text
- Switch model back to kimi-k2p5 (only vision model on this Fireworks account)
- Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)
2026-02-25 14:01:59 -05:00
James 00d8f7c94a chore: auto-commit uncommitted changes 2026-02-15 12:00:36 -05:00
James 00d0b0a0d7 Initial commit 2026-02-04 13:37:26 -05:00