docman/README.md

117 lines
3.0 KiB
Markdown

# DocMan - Document Management System
AI-powered document scanning, OCR, classification, and search.
## Quick Start
```bash
# Run directly (dev mode)
cd ~/dev/docman
make dev
# Or run the installed binary
~/bin/docman -port 8200
```
Open http://localhost:8200
## Features
- **Auto-processing**: Drop PDFs/images into `~/documents/inbox/` → auto-classified and indexed
- **PDF Preview**: Built-in PDF viewer
- **Full-text Search**: FTS5 + semantic search
- **Categories**: taxes, expenses, bills, medical, contacts, legal, insurance, banking, receipts
- **Expense Tracking**: Filter by date, export to CSV, track tax-deductible items
- **Markdown Records**: Each document gets a searchable markdown file
## Scanner Setup
1. **Set scanner to save to SMB share:**
- Share: `\\192.168.1.16\documents\inbox` (or wherever james is)
- Or use your scanner's app to save to `~/documents/inbox/`
2. **Workflow:**
- Scan document → lands in inbox
- DocMan auto-processes (OCR → classify → store)
- View/search in web UI
## Directory Structure
```
~/documents/
├── inbox/ # Drop scans here (auto-processed)
├── store/ # PDF storage (by checksum)
├── records/ # Markdown records by category
│ ├── taxes/
│ ├── expenses/
│ ├── bills/
│ └── ...
└── index/ # SQLite database
```
## Configuration
### Environment Variables
```bash
FIREWORKS_API_KEY=fw_xxx # Required for AI classification
```
### Command Line Options
```
-port HTTP port (default: 8200)
-data Data directory (default: ~/documents)
-ai-endpoint AI API endpoint (default: Fireworks)
-ai-key AI API key
-ai-model Classification model
-embed-model Embedding model
-watch Only watch inbox, don't start web server
```
## Systemd Service
```bash
# Edit to add your Fireworks API key
nano ~/.config/systemd/user/docman.service
# Enable and start
systemctl --user daemon-reload
systemctl --user enable docman
systemctl --user start docman
# Check status
systemctl --user status docman
journalctl --user -u docman -f
```
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/` | Dashboard |
| GET | `/browse` | Browse documents |
| GET | `/doc/:id` | View document |
| GET | `/search` | Search page |
| GET | `/expenses` | Expenses tracker |
| GET | `/upload` | Upload page |
| POST | `/api/upload` | Upload document |
| GET | `/api/documents/:id` | Get document JSON |
| PATCH | `/api/documents/:id` | Update document |
| DELETE | `/api/documents/:id` | Delete document |
| GET | `/api/search?q=` | Search API |
| GET | `/api/expenses/export` | Export CSV |
| GET | `/api/stats` | Dashboard stats |
## Dependencies
System packages (already installed):
- `poppler-utils` (pdftotext, pdfinfo, pdftoppm)
- `tesseract-ocr` (OCR for scanned images)
## Notes
- Without FIREWORKS_API_KEY, documents will be categorized as "uncategorized"
- The inbox watcher runs continuously, processing new files automatically
- Markdown files are searchable even without embeddings