# Document Processor Go service that watches `~/documents/inbox/` for PDFs and images, uses Kimi K2.5 (via Fireworks API) for OCR and classification, then stores and indexes them. ## Features - **File watcher**: Monitors inbox for new documents - **OCR + Classification**: Kimi K2.5 extracts text and categorizes documents - **Storage**: PDFs stored in `~/documents/store/` - **Records**: Markdown records in `~/documents/records/{category}/` - **Index**: JSON index at `~/documents/index/master.json` - **Expense export**: Auto-exports expenses to `~/documents/exports/expenses.csv` - **HTTP API**: REST endpoints for manual ingestion and search ## Setup 1. Set your Fireworks API key: ```bash export FIREWORKS_API_KEY=your_key_here ``` 2. Run the service: ```bash ./docproc ``` 3. Or install as systemd service: ```bash sudo cp docproc.service /etc/systemd/system/ # Edit /etc/systemd/system/docproc.service to add your API key sudo systemctl daemon-reload sudo systemctl enable --now docproc ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/ingest` | POST | Upload and process a document (multipart form, field: `file`) | | `/search?q=query` | GET | Search documents by content | | `/docs` | GET | List all documents | | `/doc/{id}` | GET | Get single document by ID | ## Directory Structure ``` ~/documents/ ├── inbox/ # Drop files here for processing ├── store/ # Processed PDFs stored by hash ├── records/ # Markdown records by category │ ├── tax/ │ ├── expense/ │ ├── medical/ │ └── ... ├── index/ │ └── master.json # Document index └── exports/ └── expenses.csv # Expense export ``` ## Categories Documents are classified into: - tax - expense - bill - invoice - medical - receipt - bank - insurance - legal - correspondence - other ## Usage Drop a PDF or image into `~/documents/inbox/` and the service will: 1. OCR and classify it 2. Store the original in `store/` 3. Create a markdown record in `records/{category}/` 4. Update the master index 5. Export to CSV if it's an expense 6. Delete from inbox Or POST to `/ingest`: ```bash curl -X POST http://localhost:9900/ingest -F "file=@receipt.pdf" ``` Search documents: ```bash curl "http://localhost:9900/search?q=amazon" ```