LLM Gateway
LLM tokens included
Every `exe.dev` VM has access to the LLM Gateway, a built-in proxy to
Anthropic, OpenAI, and Fireworks APIs. Your subscription includes a monthly
token allocation, and you can purchase additional tokens at
[https://exe.dev/user](https://exe.dev/user).
See the [full list of supported models](/llm-gateway-models) ([JSON](/llm-gateway-models.json)).
The gateway is available inside your VM at
`http://169.254.169.254/gateway/llm/provider`, where `provider` is one of
`anthropic`, `openai`, or `fireworks`. No API keys are necessary.
[Shelley](/docs/shelley/intro) uses the LLM Gateway by default, but you can
also use it directly from any program running on your VM.
## Using the gateway with Codex
Add an OpenAI-compatible provider to `~/.codex/config.toml`:
```toml
model_provider = "exe-openai"
[model_providers.exe-openai]
name = "exe.dev LLM Gateway"
base_url = "http://169.254.169.254/gateway/llm/openai/v1"
requires_openai_auth = false
```
Then run Codex normally:
```sh
$ codex
```
The `base_url` ends at `/v1`. Codex adds the Responses API path when it
makes model requests.
## Using the gateway with Claude Code
Add the Anthropic gateway base URL to `~/.claude/settings.json`:
```json
{
"apiKeyHelper": "printf exe-gateway",
"env": {
"ANTHROPIC_BASE_URL": "http://169.254.169.254/gateway/llm/anthropic"
}
}
```
Claude Code expects an API key source, so `apiKeyHelper` returns a harmless
placeholder. The gateway authenticates the VM; you do not need an Anthropic
API key.
Then run Claude Code normally:
```sh
$ claude
```
The `ANTHROPIC_BASE_URL` ends at `/anthropic`. Claude Code adds the Anthropic
API paths when it makes model requests.
## Using the gateway with curl
Point your requests at the gateway URL instead of the provider:
```
$ curl -s http://169.254.169.254/gateway/llm/anthropic/v1/messages \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
OpenAI and Fireworks work the same way:
```
$ curl -s http://169.254.169.254/gateway/llm/openai/v1/chat/completions \
-H "content-type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
```
$ curl -s http://169.254.169.254/gateway/llm/fireworks/inference/v1/chat/completions \
-H "content-type: application/json" \
-d '{
"model": "accounts/fireworks/models/llama-v3p1-8b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```