Insights — Kelsus

HOW WE EVAL

The best open model we tested, and the one we'd think hardest about running

We added DeepSeek-V4-Pro, a 1.6-trillion-parameter open flagship, to the Index. It matched the lineup on retrieval, saturated extraction, and cost three times more to self-host than anything else. Best on the benchmark isn't the same as right for your workload.

4 min→

HOW WE EVAL

A 550B model on one box of last-generation GPUs

We benchmarked NVIDIA's Nemotron-3-Ultra-550B. A reasoning model in a 4-bit format built for Blackwell GPUs served on a single box of the previous generation, and landed in the same retrieval band as the strongest open models.

4 min→

COST & BREAK-EVEN

Your GPU kill switch can't depend on something that might not be there

Two ways we burned GPU money we didn't mean to — a laptop in the control loop, and a kill switch that depended on a container image that quietly vanished. Both had the same fix.

4 min→

HOW WE EVAL

The vision penalty was a bug in our test data

We reported that a vision model read clean documents worse than its text-only twin. It was wrong — one bad field in our answer key, which a reasoning model exposed by refusing to go along with it. How we caught it, and what's actually true.

5 min→

RELIABILITY

The benchmark that came back almost empty

Testing our most expensive model, the quality results came back for four documents out of four hundred. The model was fine — one line of database code that only fails under load was the cause.

3 min→

DEPLOYMENT

A Kubernetes Service name can crash your model server

We named a Service vllm. Kubernetes turned that name into a setting the server read as its own — and it failed on startup before serving anything.

2 min→

Field notes from inside the perimeter.

The lock-in ratchet

Lessons from standing the architecture up.

The best open model we tested, and the one we'd think hardest about running

A 550B model on one box of last-generation GPUs

Your GPU kill switch can't depend on something that might not be there

The vision penalty was a bug in our test data

The benchmark that came back almost empty

A Kubernetes Service name can crash your model server

Get each one when it ships.