When we gave the inference server its own Kubernetes Service (the stable network name other parts of the cluster use to reach it), it failed on startup and kept restarting, with this error:

ValueError: VLLM_PORT 'tcp://172.20.34.12:8000' appears to be a URI.
This may be caused by a Kubernetes service discovery issue.

The error even names the likely cause, which is more than most errors do. But the chain of events is worth understanding, because it can hit anything you run on Kubernetes, not just a model server.

What happened

Kubernetes has a legacy feature called service links, on by default. For every Service in a namespace, it injects a set of environment variables into every pod in that namespace, in the old Docker-links format: <SERVICENAME>_SERVICE_HOST, <SERVICENAME>_PORT, and so on. Name a Service vllm and every pod in the namespace gets:

VLLM_PORT=tcp://172.20.34.12:8000

vLLM, meanwhile, reads an environment variable named VLLM_PORT as its own listen port. It expects an integer and got a URI, so it crashed before the engine initialized.

The collision is bad luck: your Service name, uppercased, matched a config variable the app already reads. It hits vLLM because its config prefix is VLLM_ and the obvious name for the Service is vllm, so the two line up.

The fix

Turn service links off on the pod spec. One line:

spec:
  enableServiceLinks: false
  containers:
    - name: vllm
      # ...

That removes the whole class of collision: no injected per-Service variables. You can also rename the Service, but enableServiceLinks: false is the durable choice: you set it once and never think about it again. Almost nothing in a modern cluster depends on service-link variables, and apps find each other through DNS.

We now set it on every workload by default, because of the failure mode rather than this single bug. Service links change a program's environment from the outside: an unrelated Service, added to the namespace months from now by someone else, can override a setting your running server reads, and nothing in your own configuration will hint at why. Kubernetes puts more into your container's environment than your files show. When a value appears from nowhere, check what the platform added before you re-read your own code.