3.0 KiB
3.0 KiB
DocMan - Document Management System
AI-powered document scanning, OCR, classification, and search.
Quick Start
# Run directly (dev mode)
cd ~/dev/docman
make dev
# Or run the installed binary
~/bin/docman -port 8200
Features
- Auto-processing: Drop PDFs/images into
~/documents/inbox/→ auto-classified and indexed - PDF Preview: Built-in PDF viewer
- Full-text Search: FTS5 + semantic search
- Categories: taxes, expenses, bills, medical, contacts, legal, insurance, banking, receipts
- Expense Tracking: Filter by date, export to CSV, track tax-deductible items
- Markdown Records: Each document gets a searchable markdown file
Scanner Setup
-
Set scanner to save to SMB share:
- Share:
\\192.168.1.16\documents\inbox(or wherever james is) - Or use your scanner's app to save to
~/documents/inbox/
- Share:
-
Workflow:
- Scan document → lands in inbox
- DocMan auto-processes (OCR → classify → store)
- View/search in web UI
Directory Structure
~/documents/
├── inbox/ # Drop scans here (auto-processed)
├── store/ # PDF storage (by checksum)
├── records/ # Markdown records by category
│ ├── taxes/
│ ├── expenses/
│ ├── bills/
│ └── ...
└── index/ # SQLite database
Configuration
Environment Variables
FIREWORKS_API_KEY=fw_xxx # Required for AI classification
Command Line Options
-port HTTP port (default: 8200)
-data Data directory (default: ~/documents)
-ai-endpoint AI API endpoint (default: Fireworks)
-ai-key AI API key
-ai-model Classification model
-embed-model Embedding model
-watch Only watch inbox, don't start web server
Systemd Service
# Edit to add your Fireworks API key
nano ~/.config/systemd/user/docman.service
# Enable and start
systemctl --user daemon-reload
systemctl --user enable docman
systemctl --user start docman
# Check status
systemctl --user status docman
journalctl --user -u docman -f
API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | / |
Dashboard |
| GET | /browse |
Browse documents |
| GET | /doc/:id |
View document |
| GET | /search |
Search page |
| GET | /expenses |
Expenses tracker |
| GET | /upload |
Upload page |
| POST | /api/upload |
Upload document |
| GET | /api/documents/:id |
Get document JSON |
| PATCH | /api/documents/:id |
Update document |
| DELETE | /api/documents/:id |
Delete document |
| GET | /api/search?q= |
Search API |
| GET | /api/expenses/export |
Export CSV |
| GET | /api/stats |
Dashboard stats |
Dependencies
System packages (already installed):
poppler-utils(pdftotext, pdfinfo, pdftoppm)tesseract-ocr(OCR for scanned images)
Notes
- Without FIREWORKS_API_KEY, documents will be categorized as "uncategorized"
- The inbox watcher runs continuously, processing new files automatically
- Markdown files are searchable even without embeddings