My Internet Drops Weren't the ISP: One Night of OPNsense Forensics

TV streams dropped whenever Claude Code got busy. The trail led through WiFi jitter, a pegged CPU core, hardware offloads poisoning FQ-CoDel — and two self-inflicted outages.

Jul 3, 2026 Homelab, Networking

rocket-cli: a Rocket.Chat MCP server with a local FTS5 brain

One day, ~25 subagents, 216 tests: building a cache-first Rocket.Chat bridge for LLM agents — FTS5 corruption forensics, typings that lie, and attention triage as the product.

Jun 10, 2026 AI, Tooling

Design Fluency Meets the Knowledge Graph

Polishing an Electron app's UI with an AI design skill on the front and a code knowledge graph on the back: design vocabulary plus a real source of truth.

May 26, 2026 AI, Tooling

How a Cryptominer Spent Two Days on My Server — and How I Found It

An AI agent investigating a slow server found a Monero miner consuming 50% CPU inside a Docker container. Here's the full incident — how it got in, what it was doing, and everything wrong that allowed it.

May 22, 2026 DevOps, Security

When your VLM test flake is actually a VNC capture race

Spent half a session tuning VLM localize prompts. The real bug was deep in my RFB client: framebuffer captures were 1–N frames behind reality.

May 18, 2026 Engineering, Testing

Turbo3 + MTP: Merging Two llama.cpp Forks

Merging turbo3 KV cache and MTP speculative decoding into one llama.cpp binary: the build crashes, the CUDA dispatch bug, and 252K context at 85% draft acceptance.

May 15, 2026 AI, Local LLMs

Qwen 3.6 Dense vs MOE on Local Stack: what MTP actually delivers

Practical comparison between Qwen 3.6 Dense and MOE on an RTX 3090, focused on real throughput by scenario and the practical impact of Multi-Token Prediction in local inference flow.

May 13, 2026 AI, Engineering

Qwen 3.6 27B with Native MTP on llama.cpp

Testing Unsloth's Qwen 3.6 27B MTP GGUF on an RTX 3090 with llama.cpp's MTP branch: native speculative decoding, no draft model, real speedup.

May 13, 2026 AI, Engineering

Running Qwen 3.6 35B MoE on an RTX 3060 12GB via -ncmoe

How a single flag in llama.cpp turns a 35B Mixture-of-Experts model from OOM to 23 tok/s on a 12GB GPU.

May 9, 2026 AI, Engineering

1.5× Faster Agentic Coding with MTP on Qwen 3.6 27B

Benchmarking Multi-Token Prediction (MTP) on Qwen 3.6 27B via llama.cpp on an RTX 3090 — 1.5× speedup in agentic tool-call chains.

May 8, 2026 AI, Engineering