fix: resolve reconnect storm and improve Ubuntu deployment

Fix WebSocket reconnect storm (issue #53) caused by stale closure
reading connection.reconnectAttempts from Zustand state. Use a ref
to track attempts, avoiding the closure capture problem entirely.

Improve Dockerfile: create .data directory with correct ownership for
SQLite, set PORT/HOSTNAME env vars explicitly.

Add deployment guide documenting Ubuntu prerequisites (python3, make,
g++ for better-sqlite3 native compilation) and platform-specific
build constraints.
This commit is contained in:
Nyk 2026-03-02 12:15:19 +07:00
parent 8510ee5f2c
commit ebdc8de8b9
3 changed files with 127 additions and 6 deletions

View File

@ -21,6 +21,10 @@ COPY --from=build /app/.next/standalone ./
COPY --from=build /app/.next/static ./.next/static
# Copy public directory if it exists (may not exist in all setups)
COPY --from=build /app/public* ./public/
# Create data directory with correct ownership for SQLite
RUN mkdir -p .data && chown nextjs:nodejs .data
USER nextjs
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME=0.0.0.0
CMD ["node", "server.js"]

113
docs/deployment.md Normal file
View File

@ -0,0 +1,113 @@
# Deployment Guide
## Prerequisites
- **Node.js** >= 20 (LTS recommended)
- **pnpm** (installed via corepack: `corepack enable && corepack prepare pnpm@latest --activate`)
### Ubuntu / Debian
`better-sqlite3` requires native compilation tools:
```bash
sudo apt-get update
sudo apt-get install -y python3 make g++
```
### macOS
Xcode command line tools are required:
```bash
xcode-select --install
```
## Quick Start (Development)
```bash
cp .env.example .env.local
pnpm install
pnpm dev
```
Open http://localhost:3000. Login with `AUTH_USER` / `AUTH_PASS` from your `.env.local`.
## Production (Direct)
```bash
pnpm install --frozen-lockfile
pnpm build
pnpm start
```
The `pnpm start` script binds to `0.0.0.0:3005`. Override with:
```bash
PORT=3000 pnpm start
```
**Important:** The production build bundles platform-specific native binaries. You must run `pnpm install` and `pnpm build` on the same OS and architecture as the target server. A build created on macOS will not work on Linux.
## Production (Docker)
```bash
docker build -t mission-control .
docker run -p 3000:3000 \
-v mission-control-data:/app/.data \
-e AUTH_USER=admin \
-e AUTH_PASS=your-secure-password \
-e API_KEY=your-api-key \
mission-control
```
The Docker image:
- Builds from `node:20-slim` with multi-stage build
- Compiles `better-sqlite3` natively inside the container (Linux x64)
- Uses Next.js standalone output for minimal image size
- Runs as non-root user `nextjs`
- Exposes port 3000 (override with `-e PORT=8080`)
### Persistent Data
SQLite database is stored in `/app/.data/` inside the container. Mount a volume to persist data across restarts:
```bash
docker run -v /path/to/data:/app/.data ...
```
## Environment Variables
See `.env.example` for the full list. Key variables:
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `AUTH_USER` | Yes | `admin` | Admin username (seeded on first run) |
| `AUTH_PASS` | Yes | - | Admin password |
| `API_KEY` | Yes | - | API key for headless access |
| `PORT` | No | `3005` (direct) / `3000` (Docker) | Server port |
| `OPENCLAW_HOME` | No | - | Path to OpenClaw installation |
| `MC_ALLOWED_HOSTS` | No | `localhost,127.0.0.1` | Allowed hosts in production |
## Troubleshooting
### "Module not found: better-sqlite3"
Native compilation failed. On Ubuntu/Debian:
```bash
sudo apt-get install -y python3 make g++
rm -rf node_modules
pnpm install
```
### "Invalid ELF header" or "Mach-O" errors
The native binary was compiled on a different platform. Rebuild:
```bash
rm -rf node_modules .next
pnpm install
pnpm build
```
### Database locked errors
Ensure only one instance is running against the same `.data/` directory. SQLite uses WAL mode but does not support multiple writers.

View File

@ -38,6 +38,7 @@ export function useWebSocket() {
const authTokenRef = useRef<string>('')
const requestIdRef = useRef<number>(0)
const handshakeCompleteRef = useRef<boolean>(false)
const reconnectAttemptsRef = useRef<number>(0)
// Heartbeat tracking
const pingCounterRef = useRef<number>(0)
@ -249,6 +250,7 @@ export function useWebSocket() {
if (frame.type === 'res' && frame.ok && !handshakeCompleteRef.current) {
console.log('Handshake complete!')
handshakeCompleteRef.current = true
reconnectAttemptsRef.current = 0
setConnection({
isConnected: true,
lastConnected: new Date(),
@ -410,13 +412,15 @@ export function useWebSocket() {
handshakeCompleteRef.current = false
stopHeartbeat()
// Auto-reconnect logic with exponential backoff
if (connection.reconnectAttempts < maxReconnectAttempts) {
const timeout = Math.min(Math.pow(2, connection.reconnectAttempts) * 1000, 30000)
console.log(`Reconnecting in ${timeout}ms... (attempt ${connection.reconnectAttempts + 1}/${maxReconnectAttempts})`)
// Auto-reconnect logic with exponential backoff (uses ref to avoid stale closure)
const attempts = reconnectAttemptsRef.current
if (attempts < maxReconnectAttempts) {
const timeout = Math.min(Math.pow(2, attempts) * 1000, 30000)
console.log(`Reconnecting in ${timeout}ms... (attempt ${attempts + 1}/${maxReconnectAttempts})`)
reconnectAttemptsRef.current = attempts + 1
setConnection({ reconnectAttempts: attempts + 1 })
reconnectTimeoutRef.current = setTimeout(() => {
setConnection({ reconnectAttempts: connection.reconnectAttempts + 1 })
connect(url, authTokenRef.current)
}, timeout)
} else {
@ -446,7 +450,7 @@ export function useWebSocket() {
console.error('Failed to connect to WebSocket:', error)
setConnection({ isConnected: false })
}
}, [connection.reconnectAttempts, setConnection, handleGatewayFrame, addLog, stopHeartbeat])
}, [setConnection, handleGatewayFrame, addLog, stopHeartbeat])
const disconnect = useCallback(() => {
if (reconnectTimeoutRef.current) {