- FTS search results now ORDER BY processed_at DESC (was rank)
- Full search page shows processed timestamp with formatDateTime
- Quick search dropdown shows formatted date next to category
- Fixed formatDate to handle timezone-aware timestamps
When reprocessing a document whose PDF is already in the store,
copyFile() would fail with 'open /srv/docsys/inbox/...: no such file
or directory' because the upload wrote to a temp inbox path that was
already cleaned up by the time async OCR completed.
The store is keyed by content hash so if the file is already there,
the copy is a no-op — skip it rather than error out.
pdftoppm zero-pads the page number based on total page count:
- <10 pages: page-1.png
- <100 pages: page-01.png
- <1000 pages: page-001.png
The code hardcoded 'page-1.png' and 'page-N.png', which fails for any
multi-page document. Use filepath.Glob('page-*.png') to find the actual
output regardless of padding width.
Fixed in both ConvertToImage() (first-page preview) and the multi-page
OCR loop in ProcessDocument().
SearchDocuments excludes full_text for performance. The MD endpoint
needs the actual OCR content, not just the summary.
Added SearchDocumentsWithFullText() and SearchDocumentsWithFullTextFallback()
that select full_text explicitly. apiSearchMDHandler now uses these,
so format=md returns the complete OCR/markdown text for each document.
New endpoint returns all matching documents as concatenated plain-text
markdown, one section per document separated by ---.
Format:
# Document: {title}
ID: {id} | Category: {category} | Date: {date} | Vendor: {vendor}
{full_text or summary}
---
Parameters:
q - search query (required)
format - must be 'md' (required; distinguishes from HTML search)
Uses same FTS5 search as existing endpoints, limit raised to 200.
Falls back to LIKE search if FTS5 fails. Returns text/markdown content type.
POST /api/search (HTML partial) unchanged.
Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought
reasoning instead of JSON for non-English docs, requiring a fallback path.
qwen3-vl works first try, no retry needed, preserves original language correctly.
- Add 'DO NOT translate, preserve original language' to vision prompts
- Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity
- Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify)
- Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text
- Switch model back to kimi-k2p5 (only vision model on this Fireworks account)
- Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)
- servePDF now supports ?download=1 query param
- Looks up document title and uses it as the Content-Disposition filename
- Download button on document page triggers actual download (not tab open)
- Added sanitizeFilename helper for safe Content-Disposition values
- Share table with random tokens and optional expiry (default 7 days)
- Public /s/{token} endpoint serves PDF directly
- Share/revoke UI on document page with copy-to-clipboard
- Caddy reverse proxy configured at docs.jongsma.me