117 lines
3.0 KiB
Markdown
117 lines
3.0 KiB
Markdown
# DocMan - Document Management System
|
|
|
|
AI-powered document scanning, OCR, classification, and search.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Run directly (dev mode)
|
|
cd ~/dev/docman
|
|
make dev
|
|
|
|
# Or run the installed binary
|
|
~/bin/docman -port 8200
|
|
```
|
|
|
|
Open http://localhost:8200
|
|
|
|
## Features
|
|
|
|
- **Auto-processing**: Drop PDFs/images into `~/documents/inbox/` → auto-classified and indexed
|
|
- **PDF Preview**: Built-in PDF viewer
|
|
- **Full-text Search**: FTS5 + semantic search
|
|
- **Categories**: taxes, expenses, bills, medical, contacts, legal, insurance, banking, receipts
|
|
- **Expense Tracking**: Filter by date, export to CSV, track tax-deductible items
|
|
- **Markdown Records**: Each document gets a searchable markdown file
|
|
|
|
## Scanner Setup
|
|
|
|
1. **Set scanner to save to SMB share:**
|
|
- Share: `\\192.168.1.16\documents\inbox` (or wherever james is)
|
|
- Or use your scanner's app to save to `~/documents/inbox/`
|
|
|
|
2. **Workflow:**
|
|
- Scan document → lands in inbox
|
|
- DocMan auto-processes (OCR → classify → store)
|
|
- View/search in web UI
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
~/documents/
|
|
├── inbox/ # Drop scans here (auto-processed)
|
|
├── store/ # PDF storage (by checksum)
|
|
├── records/ # Markdown records by category
|
|
│ ├── taxes/
|
|
│ ├── expenses/
|
|
│ ├── bills/
|
|
│ └── ...
|
|
└── index/ # SQLite database
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
FIREWORKS_API_KEY=fw_xxx # Required for AI classification
|
|
```
|
|
|
|
### Command Line Options
|
|
|
|
```
|
|
-port HTTP port (default: 8200)
|
|
-data Data directory (default: ~/documents)
|
|
-ai-endpoint AI API endpoint (default: Fireworks)
|
|
-ai-key AI API key
|
|
-ai-model Classification model
|
|
-embed-model Embedding model
|
|
-watch Only watch inbox, don't start web server
|
|
```
|
|
|
|
## Systemd Service
|
|
|
|
```bash
|
|
# Edit to add your Fireworks API key
|
|
nano ~/.config/systemd/user/docman.service
|
|
|
|
# Enable and start
|
|
systemctl --user daemon-reload
|
|
systemctl --user enable docman
|
|
systemctl --user start docman
|
|
|
|
# Check status
|
|
systemctl --user status docman
|
|
journalctl --user -u docman -f
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/` | Dashboard |
|
|
| GET | `/browse` | Browse documents |
|
|
| GET | `/doc/:id` | View document |
|
|
| GET | `/search` | Search page |
|
|
| GET | `/expenses` | Expenses tracker |
|
|
| GET | `/upload` | Upload page |
|
|
| POST | `/api/upload` | Upload document |
|
|
| GET | `/api/documents/:id` | Get document JSON |
|
|
| PATCH | `/api/documents/:id` | Update document |
|
|
| DELETE | `/api/documents/:id` | Delete document |
|
|
| GET | `/api/search?q=` | Search API |
|
|
| GET | `/api/expenses/export` | Export CSV |
|
|
| GET | `/api/stats` | Dashboard stats |
|
|
|
|
## Dependencies
|
|
|
|
System packages (already installed):
|
|
- `poppler-utils` (pdftotext, pdfinfo, pdftoppm)
|
|
- `tesseract-ocr` (OCR for scanned images)
|
|
|
|
## Notes
|
|
|
|
- Without FIREWORKS_API_KEY, documents will be categorized as "uncategorized"
|
|
- The inbox watcher runs continuously, processing new files automatically
|
|
- Markdown files are searchable even without embeddings
|