5.9 KiB
5.9 KiB
2026-02-13 (Friday)
GPU Research Session (4:14-4:48 AM)
Johan explored GPU options for local AI/CoS. Key progression:
- Started with RTX A6000 48GB ($4,599) — too expensive for the value
- RTX 8000 48GB (~$2,000) — same VRAM, older/slower, better price
- RTX 3090 24GB (~$850) — faster than RTX 8000 but only 24GB
- Tradeoff crystallized: 3090 = fast but limited VRAM, RTX 8000 = slower but can run 70B models
- Johan's concern: slow assistant = "I'll do it myself" — speed matters for adoption
- Real motivation revealed: NOT cost savings — he wants persistent memory/consistent CoS. Tired of amnesia.
- Cloud GPU rental (RunPod, Vast.ai) works for periodic LoRA training without buying hardware
- Conclusion direction: Better memory pipeline (RAG + nightly distillation) > buying GPU hardware
- Distillation/memory work is cheap model work (Qwen, K2.5, Gemini Flash)
- Opus stays for live conversation judgment
- No hardware purchase needed — fix the software/memory problem instead
Key insight from Johan
"It's not about money! It's about consistent memory" — the amnesia problem is his #1 frustration with AI assistants.
Qwen2.5-32B Assessment
- Compared to Opus 4: B/B+ vs A+ (solid mid-level vs senior engineer)
- Compared to Opus 3.5: closer but still noticeable gap
- No local model today is good enough for full autonomous CoS role
- 6 months from now: maybe (open-source improving fast)
Alex Finn Post (@AlexFinn/2021992770370764878)
- Guide on running local models via LM Studio — 1,891 likes
- Good for basics but focused on cost savings, not memory persistence
Cloudflare Agent Content Negotiation
- Cloudflare adding
Accept: text/markdownat the edge for AI agents - Added to inou TODO (
/home/johan/dev/inou/docs/TODO.md) - Relevant: inou should be agent-first, serve structured markdown to AI assistants
- Competitive differentiator vs anti-bot health platforms
Email Triage
- 1 new email: Amazon shipping (EZVALO motion sensor night lights, $20.32, arriving today)
- Updated delivery tracker, trashed email
- MC performance issue: queries taking 15-16 seconds consistently — needs investigation
RTX 5090 Scam
- Johan found $299 "RTX 5090" on eBay — zero feedback seller, obvious scam. Warned him off.
Webchat Bug
- Johan's message got swallowed (NO_REPLY triggered incorrectly), he had to resend
Cron Jobs → Kimi K2.5 on Fireworks
- Switched 7 cron jobs from Opus to Kimi K2.5 (
fireworks/accounts/fireworks/models/kimi-k2p5) - K2.5 Watchdog, claude-usage-hourly, git-audit-hourly, dashboard usage, git-audit-daily, update check, weekly memory synthesis
- Qwen 2.5 32B deprecated/removed from Fireworks — only Qwen3 models remain
- Qwen3 235B MoE had cold-start 503s (serverless scaling to zero) — unreliable
- K2.5 stays warm (popular model), ~9s runs, proven in browser agent
- Fireworks provider registered in OpenClaw config with two models: K2.5 (primary) + Qwen3 235B (backup)
Fireworks Blog Post
- Fireworks published blog about OpenClaw + Fireworks integration
- Pitch: use open models for routine tasks (10x cheaper), Opus for judgment
- Validates our exact setup
Shannon VPS — New Credentials (from Hostkey/Maxim)
- IP: 82.24.174.112, root / K_cX1aFThB
- DO NOT disable password login until Johan confirms SSH key access (lesson learned from Feb 11 lockout)
- Task: Install Shannon (KeygraphHQ/shannon) and test against inou portal ONLY
- Server ID: 53643, HostKey panel: https://panel.hostkey.com/controlpanel.html?key=639551e73029b90f-c061af4412951b2e
Fire Tablet Alert Dashboard (new project)
- Johan doesn't see Signal alerts reliably — wants a spare Fire tablet (Fully Kiosk) as alert display
- Requirements: clock, calendar, notification push with sound ("modest pling")
- Two approaches discussed: standalone web page (preferred) vs Home Assistant integration
- Johan OK with me coding it or using HA
- Plan: simple HTML dashboard on forge, SSE for push alerts, Fully Kiosk loads URL
GPU Purchase Decision
- No GPU purchase yet — persistent memory problem better solved with software (RAG + nightly distillation)
- If buying: RTX 8000 48GB (~$2K) best option for fine-tuning/70B models
- Cloud GPU (RunPod/Vast.ai) viable for periodic LoRA training
MC Performance Issue
- Message Center queries taking 15-16 seconds consistently — needs investigation
Alert Dashboard — Port Conflict Fixed
- Subagent built alert-dashboard (Node.js/Express, SSE, analog clock, calendar, alert feed)
- Initially deployed on port 9201 — WRONG, that's DocSys's port
- Moved to port 9202, restored DocSys on 9201
- Service:
alert-dashboard.service(systemd user, enabled) - Source:
/home/johan/dev/alert-dashboard/ - API: GET /, GET /api/alerts, POST /api/alerts, GET /api/alerts/stream (SSE)
- Fully Kiosk URL:
http://192.168.1.16:9202
Shannon VPS — Setup Progress
- SSH key from forge works ✅ (root@82.24.174.112)
- Password login: root / K_cX1aFThB — LEFT ENABLED per instructions
- Repo cloned to /opt/shannon
- Docker build started (still building when subagent finished)
- TODO: Check build completion, run portal test against inou.com
Kaseya Device Policy Change (IMPORTANT)
- CISO Jason Manar announced: only Kaseya-issued IT-managed devices on corporate network
- Personal/BYO devices → BYO network only, no VPN access
- Rolling out "starting tomorrow" (Feb 14) over coming weeks
- Johan currently uses personal Mac Mini for EVERYTHING (Kaseya + inou)
- Has a Kaseya XPS14 laptop he hates
- Recommended: Request a MacBook Pro (CTO-level ask), keep Mac Mini for inou on BYO network
- Johan is upset about this — impacts his entire workflow
Cron Job Fixes
- git-audit-hourly timeout bumped 60s → 120s (K2.5 needs more time for git operations)
- claude-usage-hourly had stale Qwen3 235B session — will self-correct on next run
- K2.5 Watchdog hit session lock error — transient from concurrent subagent spawns