filescanner/README.md

68 lines
1.3 KiB
Markdown
Executable File

# Filescanner
Cross-platform file inventory scanner with ClickHouse backend.
## Quick Start
```bash
# Get dependencies
go mod tidy
# Build all platforms
make all
# Or build current platform only
make build
```
## Usage
### Scan files
```bash
# Scan with dry-run (no DB)
./filescan -server myserver -path /home -dry-run
# Scan to ClickHouse
./filescan -server myserver -path /home -ch 192.168.1.253:9000
# Verbose
./filescan -server myserver -path /home -v
```
### Add hashes for duplicate detection
```bash
# Only hashes files with non-unique sizes
./hashupdate -server myserver -ch 192.168.1.253:9000
```
### Find duplicates
```sql
SELECT hash, count(*) as cnt,
groupArray(concat(server, ':', folder, '/', filename)) as files
FROM files.inventory
WHERE hash != ''
GROUP BY hash
HAVING cnt > 1
ORDER BY any(size) DESC;
```
## Binaries
After `make all`:
- `bin/filescan-mac-arm64` - Mac M1/M2/M3
- `bin/filescan-mac-amd64` - Mac Intel
- `bin/filescan-linux` - Linux
- `bin/filescan.exe` - Windows
## Excluded Directories
Automatically skips:
- Windows: `$RECYCLE.BIN`, `Windows`, `Program Files`, `AppData`, etc.
- macOS: `.Trash`, `Library`, `.Spotlight-V100`, etc.
- Linux: `/proc`, `/sys`, `/dev`, `/run`, etc.
- Common: `node_modules`, `.git`, `__pycache__`
## ClickHouse Schema
See `queries.sql` for schema and useful queries.