clawd/memory/2026-02-13.md

5.9 KiB

2026-02-13 (Friday)

GPU Research Session (4:14-4:48 AM)

Johan explored GPU options for local AI/CoS. Key progression:

  1. Started with RTX A6000 48GB ($4,599) — too expensive for the value
  2. RTX 8000 48GB (~$2,000) — same VRAM, older/slower, better price
  3. RTX 3090 24GB (~$850) — faster than RTX 8000 but only 24GB
  4. Tradeoff crystallized: 3090 = fast but limited VRAM, RTX 8000 = slower but can run 70B models
  5. Johan's concern: slow assistant = "I'll do it myself" — speed matters for adoption
  6. Real motivation revealed: NOT cost savings — he wants persistent memory/consistent CoS. Tired of amnesia.
  7. Cloud GPU rental (RunPod, Vast.ai) works for periodic LoRA training without buying hardware
  8. Conclusion direction: Better memory pipeline (RAG + nightly distillation) > buying GPU hardware
    • Distillation/memory work is cheap model work (Qwen, K2.5, Gemini Flash)
    • Opus stays for live conversation judgment
    • No hardware purchase needed — fix the software/memory problem instead

Key insight from Johan

"It's not about money! It's about consistent memory" — the amnesia problem is his #1 frustration with AI assistants.

Qwen2.5-32B Assessment

  • Compared to Opus 4: B/B+ vs A+ (solid mid-level vs senior engineer)
  • Compared to Opus 3.5: closer but still noticeable gap
  • No local model today is good enough for full autonomous CoS role
  • 6 months from now: maybe (open-source improving fast)

Alex Finn Post (@AlexFinn/2021992770370764878)

  • Guide on running local models via LM Studio — 1,891 likes
  • Good for basics but focused on cost savings, not memory persistence

Cloudflare Agent Content Negotiation

  • Cloudflare adding Accept: text/markdown at the edge for AI agents
  • Added to inou TODO (/home/johan/dev/inou/docs/TODO.md)
  • Relevant: inou should be agent-first, serve structured markdown to AI assistants
  • Competitive differentiator vs anti-bot health platforms

Email Triage

  • 1 new email: Amazon shipping (EZVALO motion sensor night lights, $20.32, arriving today)
  • Updated delivery tracker, trashed email
  • MC performance issue: queries taking 15-16 seconds consistently — needs investigation

RTX 5090 Scam

  • Johan found $299 "RTX 5090" on eBay — zero feedback seller, obvious scam. Warned him off.

Webchat Bug

  • Johan's message got swallowed (NO_REPLY triggered incorrectly), he had to resend

Cron Jobs → Kimi K2.5 on Fireworks

  • Switched 7 cron jobs from Opus to Kimi K2.5 (fireworks/accounts/fireworks/models/kimi-k2p5)
  • K2.5 Watchdog, claude-usage-hourly, git-audit-hourly, dashboard usage, git-audit-daily, update check, weekly memory synthesis
  • Qwen 2.5 32B deprecated/removed from Fireworks — only Qwen3 models remain
  • Qwen3 235B MoE had cold-start 503s (serverless scaling to zero) — unreliable
  • K2.5 stays warm (popular model), ~9s runs, proven in browser agent
  • Fireworks provider registered in OpenClaw config with two models: K2.5 (primary) + Qwen3 235B (backup)

Fireworks Blog Post

  • Fireworks published blog about OpenClaw + Fireworks integration
  • Pitch: use open models for routine tasks (10x cheaper), Opus for judgment
  • Validates our exact setup

Shannon VPS — New Credentials (from Hostkey/Maxim)

Fire Tablet Alert Dashboard (new project)

  • Johan doesn't see Signal alerts reliably — wants a spare Fire tablet (Fully Kiosk) as alert display
  • Requirements: clock, calendar, notification push with sound ("modest pling")
  • Two approaches discussed: standalone web page (preferred) vs Home Assistant integration
  • Johan OK with me coding it or using HA
  • Plan: simple HTML dashboard on forge, SSE for push alerts, Fully Kiosk loads URL

GPU Purchase Decision

  • No GPU purchase yet — persistent memory problem better solved with software (RAG + nightly distillation)
  • If buying: RTX 8000 48GB (~$2K) best option for fine-tuning/70B models
  • Cloud GPU (RunPod/Vast.ai) viable for periodic LoRA training

MC Performance Issue

  • Message Center queries taking 15-16 seconds consistently — needs investigation

Alert Dashboard — Port Conflict Fixed

  • Subagent built alert-dashboard (Node.js/Express, SSE, analog clock, calendar, alert feed)
  • Initially deployed on port 9201 — WRONG, that's DocSys's port
  • Moved to port 9202, restored DocSys on 9201
  • Service: alert-dashboard.service (systemd user, enabled)
  • Source: /home/johan/dev/alert-dashboard/
  • API: GET /, GET /api/alerts, POST /api/alerts, GET /api/alerts/stream (SSE)
  • Fully Kiosk URL: http://192.168.1.16:9202

Shannon VPS — Setup Progress

  • SSH key from forge works (root@82.24.174.112)
  • Password login: root / K_cX1aFThB — LEFT ENABLED per instructions
  • Repo cloned to /opt/shannon
  • Docker build started (still building when subagent finished)
  • TODO: Check build completion, run portal test against inou.com

Kaseya Device Policy Change (IMPORTANT)

  • CISO Jason Manar announced: only Kaseya-issued IT-managed devices on corporate network
  • Personal/BYO devices → BYO network only, no VPN access
  • Rolling out "starting tomorrow" (Feb 14) over coming weeks
  • Johan currently uses personal Mac Mini for EVERYTHING (Kaseya + inou)
  • Has a Kaseya XPS14 laptop he hates
  • Recommended: Request a MacBook Pro (CTO-level ask), keep Mac Mini for inou on BYO network
  • Johan is upset about this — impacts his entire workflow

Cron Job Fixes

  • git-audit-hourly timeout bumped 60s → 120s (K2.5 needs more time for git operations)
  • claude-usage-hourly had stale Qwen3 235B session — will self-correct on next run
  • K2.5 Watchdog hit session lock error — transient from concurrent subagent spawns