Commit Graph

31 Commits

Author SHA1 Message Date
James a1d2546117 Search: sort by timestamp desc, display dates in results
- FTS search results now ORDER BY processed_at DESC (was rank)
- Full search page shows processed timestamp with formatDateTime
- Quick search dropdown shows formatted date next to category
- Fixed formatDate to handle timezone-aware timestamps
2026-03-31 13:34:54 -04:00
James 31d6cb6f86 fix: skip store copy if file already exists (reprocess safety)
When reprocessing a document whose PDF is already in the store,
copyFile() would fail with 'open /srv/docsys/inbox/...: no such file
or directory' because the upload wrote to a temp inbox path that was
already cleaned up by the time async OCR completed.

The store is keyed by content hash so if the file is already there,
the copy is a no-op — skip it rather than error out.
2026-03-23 14:27:38 -04:00
James 883f118d66 fix: pdftoppm output filename glob instead of hardcoded page-1.png
pdftoppm zero-pads the page number based on total page count:
- <10 pages: page-1.png
- <100 pages: page-01.png
- <1000 pages: page-001.png

The code hardcoded 'page-1.png' and 'page-N.png', which fails for any
multi-page document. Use filepath.Glob('page-*.png') to find the actual
output regardless of padding width.

Fixed in both ConvertToImage() (first-page preview) and the multi-page
OCR loop in ProcessDocument().
2026-03-23 14:14:28 -04:00
James 9622ab9390 fix: format=md endpoint now returns full OCR text (full_text field)
SearchDocuments excludes full_text for performance. The MD endpoint
needs the actual OCR content, not just the summary.

Added SearchDocumentsWithFullText() and SearchDocumentsWithFullTextFallback()
that select full_text explicitly. apiSearchMDHandler now uses these,
so format=md returns the complete OCR/markdown text for each document.
2026-03-23 14:07:20 -04:00
James 405a6f697f feat: add GET /api/search?q=...&format=md for AI/LLM consumption
New endpoint returns all matching documents as concatenated plain-text
markdown, one section per document separated by ---.

Format:
  # Document: {title}
  ID: {id} | Category: {category} | Date: {date} | Vendor: {vendor}

  {full_text or summary}

  ---

Parameters:
  q      - search query (required)
  format - must be 'md' (required; distinguishes from HTML search)

Uses same FTS5 search as existing endpoints, limit raised to 200.
Falls back to LIKE search if FTS5 fails. Returns text/markdown content type.
POST /api/search (HTML partial) unchanged.
2026-03-23 13:58:47 -04:00
James 63d4e5e5ca chore: auto-commit uncommitted changes 2026-02-28 06:01:28 -05:00
James 2c91d5649e chore: auto-commit uncommitted changes 2026-02-28 00:01:21 -05:00
James bbc029196a chore: auto-commit uncommitted changes 2026-02-25 18:01:27 -05:00
James 83373885d4 Add vocabulary hints for handwriting: Jongsma, Johan, Tatyana, St. Petersburg FL 2026-02-25 14:24:04 -05:00
James 1b4c82ab83 Improve title prompt: require specific, identifying titles with sender+topic+date 2026-02-25 14:21:43 -05:00
James 4970157690 Switch vision model to qwen3-vl-30b-a3b-instruct
Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought
reasoning instead of JSON for non-English docs, requiring a fallback path.
qwen3-vl works first try, no retry needed, preserves original language correctly.
2026-02-25 14:17:54 -05:00
James 193d88afef Add delete button to category list view 2026-02-25 14:09:05 -05:00
James d962c9839d Fix extraction: don't translate, fallback OCR+classify path for non-JSON responses
- Add 'DO NOT translate, preserve original language' to vision prompts
- Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity
- Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify)
- Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text
- Switch model back to kimi-k2p5 (only vision model on this Fireworks account)
- Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)
2026-02-25 14:01:59 -05:00
James 00d8f7c94a chore: auto-commit uncommitted changes 2026-02-15 12:00:36 -05:00
James 9d6ad09b53 Use proper content-type for downloads to avoid Chrome insecure download block 2026-02-12 17:53:51 -05:00
James a1d156bbd5 Fix download: serve file manually to avoid http.ServeFile header conflicts 2026-02-12 17:50:12 -05:00
James 99b39ee737 Add download attribute to download link to prevent inline viewing 2026-02-12 17:29:48 -05:00
James f59c12e25c Add download link with pretty filename from document title
- servePDF now supports ?download=1 query param
- Looks up document title and uses it as the Content-Disposition filename
- Download button on document page triggers actual download (not tab open)
- Added sanitizeFilename helper for safe Content-Disposition values
2026-02-12 17:27:12 -05:00
James 1b49dac87f Document page: two-row layout - details|summary+notes top, OCR|PDF bottom 2026-02-10 04:14:51 -05:00
James 6adfefff7a Change processed date format to 'Jan 02, 2006 3:04 PM EST' 2026-02-10 04:09:35 -05:00
James a52ab6e20d Document details: two-column layout (category/vendor/amount | date/processed/filename) 2026-02-10 04:06:59 -05:00
James 9c9bd5e881 Document details: inline category dropdown, formatted processed_at timestamp 2026-02-10 04:00:11 -05:00
James 4a0e9648ac Move category edit to document page (inline dropdown), revert dashboard to static badges 2026-02-10 03:55:27 -05:00
James dabd97e13c Dashboard: formatted timestamps (MM/DD/YYYY HH:MM TZ), inline category edit dropdown 2026-02-10 03:52:46 -05:00
James 3a6aa8cbda Dashboard: narrower categories column, show scan time in recent docs, add inou/sophia categories 2026-02-10 03:48:20 -05:00
James b3bb615075 Add migration script for hash-based IDs to date-slug format (31 docs migrated) 2026-02-10 00:34:44 -05:00
James 5445b294cb Share links now use .pdf extension and Content-Disposition header for Android compatibility 2026-02-10 00:33:13 -05:00
James a77a31f4c9 Fix share links: use external URL (docs.jongsma.me), fix copy button, add copy on existing shares 2026-02-09 12:09:16 -05:00
James 9f0bac5783 Add document sharing with expiring links
- Share table with random tokens and optional expiry (default 7 days)
- Public /s/{token} endpoint serves PDF directly
- Share/revoke UI on document page with copy-to-clipboard
- Caddy reverse proxy configured at docs.jongsma.me
2026-02-09 11:28:21 -05:00
James a73ae5c03e fix: delete document cleans up store files and embeddings 2026-02-08 03:55:42 -05:00
James 00d0b0a0d7 Initial commit 2026-02-04 13:37:26 -05:00