The lock-in ratchet — Kelsus Insights

Let's grant the obvious first. Calling a frontier model over a hyperscaler API is the fastest way to ship useful AI. The integration is trivial, the quality is high, and the first results are good enough that you'll want more of them. We build that way ourselves when it fits. The first deployment is easy to approve. The trouble shows up around the tenth.

Each win pushes a little more of your core logic behind that API. Extraction here, classification there, a routing step, a summarization step, until a meaningful share of how your business works is expressed as calls to a model you don't run, under a contract you didn't write, priced on a schedule you don't set. Every one of those calls was the right decision on its own. Strung together, they harden into a dependency, and because the returns are real, it keeps happening.

A meaningful share of how your business works ends up expressed as calls to a model you don't run and can't price.

Ordinary vendor lock-in you can at least model. This kind you can't, on two fronts. You don't control the roadmap: a model you depend on can be deprecated, throttled, or changed under you on a timeline you don't control. And you don't control the price. Inference has been sold below cost to win the market, and nobody can tell you what a million tokens of your core workflow will cost eighteen months from now. You don't have to assume bad intent to find that untenable for planning. You only have to notice that a number driving a growing share of your unit economics is set by someone else and trending in a direction you can't forecast.

For many companies this is a fine trade. If AI sits at the edge of your business, rent it, and we'll say so. But in a regulated, document-heavy business (claims, underwriting, clinical records, contracts, filings), two things stack. The data has nowhere it's allowed to go, and the workloads are central enough that the pricing exposure scales with the whole business. The ratchet stops being a convenience and becomes a board-level concentration risk.

"But you deploy on AWS. Isn't that the same dependency?"

It's the fair objection, and the distinction is the whole point. Running on a hyperscaler's infrastructure is portable. It lives in your own account, whether that's AWS, Azure, or GCP, and because it's defined in Terraform it can move to another account, another region, another cloud, or on-prem. We deploy the same platform on all three today, so the choice stays yours.

But the deeper difference isn't about portability. Renting infrastructure only ever changed where your systems run. The business decisions stayed inside your own four walls, made by your own developers and your own staff. A proprietary model is a different kind of dependency, because the model is where the thinking happens. When the agents that reason and act for your business run on a model that lives on another company's servers, you have handed the way your business thinks and decides to that company. And it doesn't come back: there's no export, no drop-in equivalent, no way to take the weights with you. Renting compute never reached that far. This does.

Open-weight models break that second lock without asking you to leave the cloud. The model runs in your account, so the reasoning stays where your business is. You keep it, you can move it, and you set your own cost curve, because the only inputs are hardware and your own engineering, both of which you can shop, schedule, and forecast.

Where do you stand?

The questions are concrete, and you can answer them this quarter:

What share of your core workflows already call a model you don't run?
What would it cost, in time and money, to re-platform them if the price doubled or the model was deprecated?
Can you forecast your per-unit inference cost eighteen months out to within, say, 20%?

If the answers are "a lot," "we don't know," and "no," what you have is a concentration risk, and it grows every time the current approach works.

You can have the returns without the ratchet

This isn't an argument against AI, the cloud, or the hyperscalers. They may be the right call for parts of your stack, and we'll tell you where. The narrower point is to own the part of the AI stack that has become load-bearing for your business. Open-weight models are now competitive with frontier systems on these workloads: RAG, extraction, classification, and agentic document routing. That is why we publish a quarterly benchmark instead of asking you to take it on faith. The first step is knowing how far down the ratchet you already are.

"But you deploy on AWS. Isn't that the same dependency?"

Where do you stand?

You can have the returns without the ratchet

Find out how far down you already are.