filescanner/README.md

1.3 KiB
Executable File

Filescanner

Cross-platform file inventory scanner with ClickHouse backend.

Quick Start

# Get dependencies
go mod tidy

# Build all platforms
make all

# Or build current platform only
make build

Usage

Scan files

# Scan with dry-run (no DB)
./filescan -server myserver -path /home -dry-run

# Scan to ClickHouse
./filescan -server myserver -path /home -ch 192.168.1.253:9000

# Verbose
./filescan -server myserver -path /home -v

Add hashes for duplicate detection

# Only hashes files with non-unique sizes
./hashupdate -server myserver -ch 192.168.1.253:9000

Find duplicates

SELECT hash, count(*) as cnt, 
       groupArray(concat(server, ':', folder, '/', filename)) as files
FROM files.inventory 
WHERE hash != '' 
GROUP BY hash 
HAVING cnt > 1 
ORDER BY any(size) DESC;

Binaries

After make all:

  • bin/filescan-mac-arm64 - Mac M1/M2/M3
  • bin/filescan-mac-amd64 - Mac Intel
  • bin/filescan-linux - Linux
  • bin/filescan.exe - Windows

Excluded Directories

Automatically skips:

  • Windows: $RECYCLE.BIN, Windows, Program Files, AppData, etc.
  • macOS: .Trash, Library, .Spotlight-V100, etc.
  • Linux: /proc, /sys, /dev, /run, etc.
  • Common: node_modules, .git, __pycache__

ClickHouse Schema

See queries.sql for schema and useful queries.