rocket-cli: a Rocket.Chat MCP server with a local FTS5 brain
One day, ~25 subagents, 216 tests: building a cache-first Rocket.Chat bridge for LLM agents — FTS5 corruption forensics, typings that lie, and attention triage as the product.
One day, ~25 subagents, 216 tests: building a cache-first Rocket.Chat bridge for LLM agents — FTS5 corruption forensics, typings that lie, and attention triage as the product.
Polishing an Electron app's UI with an AI design skill on the front and a code knowledge graph on the back: design vocabulary plus a real source of truth.
An AI agent investigating a slow server found a Monero miner consuming 50% CPU inside a Docker container. Here's the full incident — how it got in, what it was doing, and everything wrong that allowed it.
Spent half a session tuning VLM localize prompts. The real bug was deep in my RFB client: framebuffer captures were 1–N frames behind reality.
Merging turbo3 KV cache and MTP speculative decoding into one llama.cpp binary: the build crashes, the CUDA dispatch bug, and 252K context at 85% draft acceptance.
Practical comparison between Qwen 3.6 Dense and MOE on an RTX 3090, focused on real throughput by scenario and the practical impact of Multi-Token Prediction in local inference flow.
Testing Unsloth's Qwen 3.6 27B MTP GGUF on an RTX 3090 with llama.cpp's MTP branch: native speculative decoding, no draft model, real speedup.
How a single flag in llama.cpp turns a 35B Mixture-of-Experts model from OOM to 23 tok/s on a 12GB GPU.
Benchmarking Multi-Token Prediction (MTP) on Qwen 3.6 27B via llama.cpp on an RTX 3090 — 1.5× speedup in agentic tool-call chains.
A web dashboard for 96 PR reviews — built with Python stdlib, Chart.js, and zero external server dependencies.