Imported from bare git on Zurich
Go to file
James 880f9dab9d Initial commit 2026-02-04 13:35:03 -05:00
build Initial commit 2026-02-04 13:35:03 -05:00
cmd/docman Initial commit 2026-02-04 13:35:03 -05:00
deploy Initial commit 2026-02-04 13:35:03 -05:00
internal Initial commit 2026-02-04 13:35:03 -05:00
.gitignore Initial commit 2026-02-04 13:35:03 -05:00
Makefile Initial commit 2026-02-04 13:35:03 -05:00
README.md Initial commit 2026-02-04 13:35:03 -05:00
go.mod Initial commit 2026-02-04 13:35:03 -05:00
go.sum Initial commit 2026-02-04 13:35:03 -05:00

README.md

DocMan - Document Management System

AI-powered document scanning, OCR, classification, and search.

Quick Start

# Run directly (dev mode)
cd ~/dev/docman
make dev

# Or run the installed binary
~/bin/docman -port 8200

Open http://localhost:8200

Features

  • Auto-processing: Drop PDFs/images into ~/documents/inbox/ → auto-classified and indexed
  • PDF Preview: Built-in PDF viewer
  • Full-text Search: FTS5 + semantic search
  • Categories: taxes, expenses, bills, medical, contacts, legal, insurance, banking, receipts
  • Expense Tracking: Filter by date, export to CSV, track tax-deductible items
  • Markdown Records: Each document gets a searchable markdown file

Scanner Setup

  1. Set scanner to save to SMB share:

    • Share: \\192.168.1.16\documents\inbox (or wherever james is)
    • Or use your scanner's app to save to ~/documents/inbox/
  2. Workflow:

    • Scan document → lands in inbox
    • DocMan auto-processes (OCR → classify → store)
    • View/search in web UI

Directory Structure

~/documents/
├── inbox/      # Drop scans here (auto-processed)
├── store/      # PDF storage (by checksum)
├── records/    # Markdown records by category
│   ├── taxes/
│   ├── expenses/
│   ├── bills/
│   └── ...
└── index/      # SQLite database

Configuration

Environment Variables

FIREWORKS_API_KEY=fw_xxx    # Required for AI classification

Command Line Options

-port        HTTP port (default: 8200)
-data        Data directory (default: ~/documents)
-ai-endpoint AI API endpoint (default: Fireworks)
-ai-key      AI API key
-ai-model    Classification model
-embed-model Embedding model
-watch       Only watch inbox, don't start web server

Systemd Service

# Edit to add your Fireworks API key
nano ~/.config/systemd/user/docman.service

# Enable and start
systemctl --user daemon-reload
systemctl --user enable docman
systemctl --user start docman

# Check status
systemctl --user status docman
journalctl --user -u docman -f

API Endpoints

Method Path Description
GET / Dashboard
GET /browse Browse documents
GET /doc/:id View document
GET /search Search page
GET /expenses Expenses tracker
GET /upload Upload page
POST /api/upload Upload document
GET /api/documents/:id Get document JSON
PATCH /api/documents/:id Update document
DELETE /api/documents/:id Delete document
GET /api/search?q= Search API
GET /api/expenses/export Export CSV
GET /api/stats Dashboard stats

Dependencies

System packages (already installed):

  • poppler-utils (pdftotext, pdfinfo, pdftoppm)
  • tesseract-ocr (OCR for scanned images)

Notes

  • Without FIREWORKS_API_KEY, documents will be categorized as "uncategorized"
  • The inbox watcher runs continuously, processing new files automatically
  • Markdown files are searchable even without embeddings