docsys

Commit Graph

Author	SHA1	Message	Date
James	883f118d66	fix: pdftoppm output filename glob instead of hardcoded page-1.png pdftoppm zero-pads the page number based on total page count: - <10 pages: page-1.png - <100 pages: page-01.png - <1000 pages: page-001.png The code hardcoded 'page-1.png' and 'page-N.png', which fails for any multi-page document. Use filepath.Glob('page-*.png') to find the actual output regardless of padding width. Fixed in both ConvertToImage() (first-page preview) and the multi-page OCR loop in ProcessDocument().	2026-03-23 14:14:28 -04:00
James	9622ab9390	fix: format=md endpoint now returns full OCR text (full_text field) SearchDocuments excludes full_text for performance. The MD endpoint needs the actual OCR content, not just the summary. Added SearchDocumentsWithFullText() and SearchDocumentsWithFullTextFallback() that select full_text explicitly. apiSearchMDHandler now uses these, so format=md returns the complete OCR/markdown text for each document.	2026-03-23 14:07:20 -04:00
James	405a6f697f	feat: add GET /api/search?q=...&format=md for AI/LLM consumption New endpoint returns all matching documents as concatenated plain-text markdown, one section per document separated by ---. Format: # Document: {title} ID: {id} \| Category: {category} \| Date: {date} \| Vendor: {vendor} {full_text or summary} --- Parameters: q - search query (required) format - must be 'md' (required; distinguishes from HTML search) Uses same FTS5 search as existing endpoints, limit raised to 200. Falls back to LIKE search if FTS5 fails. Returns text/markdown content type. POST /api/search (HTML partial) unchanged.	2026-03-23 13:58:47 -04:00
James	63d4e5e5ca	chore: auto-commit uncommitted changes	2026-02-28 06:01:28 -05:00
James	2c91d5649e	chore: auto-commit uncommitted changes	2026-02-28 00:01:21 -05:00
James	bbc029196a	chore: auto-commit uncommitted changes	2026-02-25 18:01:27 -05:00
James	83373885d4	Add vocabulary hints for handwriting: Jongsma, Johan, Tatyana, St. Petersburg FL	2026-02-25 14:24:04 -05:00
James	1b4c82ab83	Improve title prompt: require specific, identifying titles with sender+topic+date	2026-02-25 14:21:43 -05:00
James	4970157690	Switch vision model to qwen3-vl-30b-a3b-instruct Replaces kimi-k2p5 for all vision tasks. K2.5 was outputting chain-of-thought reasoning instead of JSON for non-English docs, requiring a fallback path. qwen3-vl works first try, no retry needed, preserves original language correctly.	2026-02-25 14:17:54 -05:00
James	193d88afef	Add delete button to category list view	2026-02-25 14:09:05 -05:00
James	d962c9839d	Fix extraction: don't translate, fallback OCR+classify path for non-JSON responses - Add 'DO NOT translate, preserve original language' to vision prompts - Shorter/tighter JSON prompt to reduce K2.5 reasoning verbosity - Fallback: when AnalyzeWithVision returns no JSON, do AnalyzePageOnly (plain text) then AnalyzeText (classify) - Fallback to AnalyzePageOnly for single-page PDFs with empty/placeholder full_text - Switch model back to kimi-k2p5 (only vision model on this Fireworks account) - Build with CGO_ENABLED=1 -tags fts5 (required for SQLite FTS5)	2026-02-25 14:01:59 -05:00
James	00d8f7c94a	chore: auto-commit uncommitted changes	2026-02-15 12:00:36 -05:00
James	9d6ad09b53	Use proper content-type for downloads to avoid Chrome insecure download block	2026-02-12 17:53:51 -05:00
James	a1d156bbd5	Fix download: serve file manually to avoid http.ServeFile header conflicts	2026-02-12 17:50:12 -05:00
James	99b39ee737	Add download attribute to download link to prevent inline viewing	2026-02-12 17:29:48 -05:00
James	f59c12e25c	Add download link with pretty filename from document title - servePDF now supports ?download=1 query param - Looks up document title and uses it as the Content-Disposition filename - Download button on document page triggers actual download (not tab open) - Added sanitizeFilename helper for safe Content-Disposition values	2026-02-12 17:27:12 -05:00
James	1b49dac87f	Document page: two-row layout - details\|summary+notes top, OCR\|PDF bottom	2026-02-10 04:14:51 -05:00
James	6adfefff7a	Change processed date format to 'Jan 02, 2006 3:04 PM EST'	2026-02-10 04:09:35 -05:00
James	a52ab6e20d	Document details: two-column layout (category/vendor/amount \| date/processed/filename)	2026-02-10 04:06:59 -05:00
James	9c9bd5e881	Document details: inline category dropdown, formatted processed_at timestamp	2026-02-10 04:00:11 -05:00
James	4a0e9648ac	Move category edit to document page (inline dropdown), revert dashboard to static badges	2026-02-10 03:55:27 -05:00
James	dabd97e13c	Dashboard: formatted timestamps (MM/DD/YYYY HH:MM TZ), inline category edit dropdown	2026-02-10 03:52:46 -05:00
James	3a6aa8cbda	Dashboard: narrower categories column, show scan time in recent docs, add inou/sophia categories	2026-02-10 03:48:20 -05:00
James	b3bb615075	Add migration script for hash-based IDs to date-slug format (31 docs migrated)	2026-02-10 00:34:44 -05:00
James	5445b294cb	Share links now use .pdf extension and Content-Disposition header for Android compatibility	2026-02-10 00:33:13 -05:00
James	a77a31f4c9	Fix share links: use external URL (docs.jongsma.me), fix copy button, add copy on existing shares	2026-02-09 12:09:16 -05:00
James	9f0bac5783	Add document sharing with expiring links - Share table with random tokens and optional expiry (default 7 days) - Public /s/{token} endpoint serves PDF directly - Share/revoke UI on document page with copy-to-clipboard - Caddy reverse proxy configured at docs.jongsma.me	2026-02-09 11:28:21 -05:00
James	a73ae5c03e	fix: delete document cleans up store files and embeddings	2026-02-08 03:55:42 -05:00
James	00d0b0a0d7	Initial commit	2026-02-04 13:37:26 -05:00

29 Commits All Branches Search

29 Commits

All Branches