Insights

Field notes from inside the perimeter.

Two posts a month, with one bar: every piece should be something a staff engineer would forward to their team. OSS landscape updates, architecture deep-dives from real engagements, and honest accounting of self-host versus API. No thought-leadership for its own sake.

Latest

Cost & break-even reality check

The lock-in ratchet

Calling a hyperscaler model is the fastest way to ship AI — which is exactly why the dependency compounds. Why every win pushes more of your core logic behind a model you don't run and can't price, and what open weights actually change.

Jon · June 2026 · 4 minRead →

What we learned building it

Lessons from standing the architecture up.

Real problems we hit building the reference architecture and benchmark — what broke, why, and the fix — written plainly enough that the next team finds the answer instead of the wall.

COST & BREAK-EVEN

Your GPU kill switch can't depend on something that might not be there

Two ways we burned GPU money we didn't mean to — a laptop in the control loop, and a kill switch that depended on a container image that quietly vanished. Both had the same fix.

4 min
HOW WE EVAL

A vision model is a worse text extractor than its text-only twin

Two findings from benchmarking open-weight models on real document extraction: vision costs accuracy on inputs that don't need it, and scale pays only where vision is hard.

4 min
RELIABILITY

The benchmark that came back almost empty

Testing our most expensive model, the quality results came back for four documents out of four hundred. The model was fine — one line of database code that only fails under load was the cause.

3 min
DEPLOYMENT

A Kubernetes Service name can crash your model server

We named a Service vllm. Kubernetes turned that name into a setting the server read as its own — and it failed on startup before serving anything.

2 min

Get each one when it ships.

One field, no sequence. We send the post and the quarterly benchmark index — nothing else.