Your GPU kill switch can't depend on something that might not be there
Two ways we burned GPU money we didn't mean to — a laptop in the control loop, and a kill switch that depended on a container image that quietly vanished. Both had the same fix.
A vision model is a worse text extractor than its text-only twin
Two findings from benchmarking open-weight models on real document extraction: vision costs accuracy on inputs that don't need it, and scale pays only where vision is hard.
The benchmark that came back almost empty
Testing our most expensive model, the quality results came back for four documents out of four hundred. The model was fine — one line of database code that only fails under load was the cause.
A Kubernetes Service name can crash your model server
We named a Service vllm. Kubernetes turned that name into a setting the server read as its own — and it failed on startup before serving anything.