# DocMan - Document Management System AI-powered document scanning, OCR, classification, and search. ## Quick Start ```bash # Run directly (dev mode) cd ~/dev/docman make dev # Or run the installed binary ~/bin/docman -port 8200 ``` Open http://localhost:8200 ## Features - **Auto-processing**: Drop PDFs/images into `~/documents/inbox/` → auto-classified and indexed - **PDF Preview**: Built-in PDF viewer - **Full-text Search**: FTS5 + semantic search - **Categories**: taxes, expenses, bills, medical, contacts, legal, insurance, banking, receipts - **Expense Tracking**: Filter by date, export to CSV, track tax-deductible items - **Markdown Records**: Each document gets a searchable markdown file ## Scanner Setup 1. **Set scanner to save to SMB share:** - Share: `\\192.168.1.16\documents\inbox` (or wherever james is) - Or use your scanner's app to save to `~/documents/inbox/` 2. **Workflow:** - Scan document → lands in inbox - DocMan auto-processes (OCR → classify → store) - View/search in web UI ## Directory Structure ``` ~/documents/ ├── inbox/ # Drop scans here (auto-processed) ├── store/ # PDF storage (by checksum) ├── records/ # Markdown records by category │ ├── taxes/ │ ├── expenses/ │ ├── bills/ │ └── ... └── index/ # SQLite database ``` ## Configuration ### Environment Variables ```bash FIREWORKS_API_KEY=fw_xxx # Required for AI classification ``` ### Command Line Options ``` -port HTTP port (default: 8200) -data Data directory (default: ~/documents) -ai-endpoint AI API endpoint (default: Fireworks) -ai-key AI API key -ai-model Classification model -embed-model Embedding model -watch Only watch inbox, don't start web server ``` ## Systemd Service ```bash # Edit to add your Fireworks API key nano ~/.config/systemd/user/docman.service # Enable and start systemctl --user daemon-reload systemctl --user enable docman systemctl --user start docman # Check status systemctl --user status docman journalctl --user -u docman -f ``` ## API Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/` | Dashboard | | GET | `/browse` | Browse documents | | GET | `/doc/:id` | View document | | GET | `/search` | Search page | | GET | `/expenses` | Expenses tracker | | GET | `/upload` | Upload page | | POST | `/api/upload` | Upload document | | GET | `/api/documents/:id` | Get document JSON | | PATCH | `/api/documents/:id` | Update document | | DELETE | `/api/documents/:id` | Delete document | | GET | `/api/search?q=` | Search API | | GET | `/api/expenses/export` | Export CSV | | GET | `/api/stats` | Dashboard stats | ## Dependencies System packages (already installed): - `poppler-utils` (pdftotext, pdfinfo, pdftoppm) - `tesseract-ocr` (OCR for scanned images) ## Notes - Without FIREWORKS_API_KEY, documents will be categorized as "uncategorized" - The inbox watcher runs continuously, processing new files automatically - Markdown files are searchable even without embeddings