Documentation
Current API
Current is a drop-in OpenAI-compatible inference marketplace. Sellers list competing offers. It scores them on five axes (price, latency, uptime, liquidity, and health), routes each request to the highest-scoring offer, and fails over automatically. One API, one bill, and the competitive floor every time.
Introduction
If you already use the OpenAI SDK, you can switch to Current by changing two values: the base URL and the API key. Everything else (chat, completions, embeddings, streaming, tools) works unchanged. Current then routes each request across providers and returns a non-breaking x_current object so you can see exactly which provider served it and what it cost.
| Base URL | https://api.currentinference.com |
| Auth header | Authorization: Bearer cur_live_... |
| Content type | application/json |
| Streaming | text/event-stream (SSE) |
Quickstart
1. Create an API key in the dashboard (new accounts start with free credit). 2. Point any OpenAI-compatible client at Current. 3. Make your first call:
curl https://api.currentinference.com/v1/chat/completions \
-H "Authorization: Bearer cur_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"messages": [{"role": "user", "content": "Explain routing in one sentence."}]
}'With the OpenAI SDK (Python)
from openai import OpenAI
client = OpenAI(
api_key="cur_live_...",
base_url="https://api.currentinference.com/v1", # the only change
)
resp = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)With the OpenAI SDK (TypeScript)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "cur_live_...",
baseURL: "https://api.currentinference.com/v1", // the only change
});
const resp = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);Authentication
Every request must include your secret key as a bearer token. Keys are created in the dashboard, shown once at creation, and prefixed cur_live_. Keep them server-side; never ship a key in client-side code.
Authorization: Bearer cur_live_...A missing or invalid key returns 401 invalid_api_key. Revoke a leaked key from the dashboard at any time, and revoked keys stop working immediately. Keys can carry an optional monthly spend cap (402 key_cap_reached once hit).
cur_test_ keys are a free sandbox: requests route through the real engine but are served by Current’s deterministic mock provider at $0, which is ideal for CI and integration tests. Sandbox responses carry x_current.mode: "test".
Models
Request a model by its Current id and the router resolves it to the cheapest healthy provider that serves it. The catalog spans 15 providers (OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Mistral, xAI, Google, Cerebras, SambaNova, DeepInfra, Novita, Nebius, and Venice) and the high-spread open models (Llama 3.3/4, DeepSeek V3/R1, Qwen3, Kimi K2, and more). The live, authoritative list is always GET /v1/models, and the cross-provider price board is public at GET /v1/network.
| Model id | Type | Example spread |
|---|---|---|
| llama-3.3-70b | chat | 9 providers (Groq, DeepInfra, Cerebras, Venice, …) |
| deepseek-v3 | chat | 6 providers |
| kimi-k2 | chat | 5 providers |
| qwen3-235b | chat | 5 providers |
| gpt-4o-mini · gpt-4.1-mini · gpt-5-mini | chat | OpenAI |
| claude-haiku · claude-haiku-4.5 | chat | Anthropic |
| venice-uncensored | chat | Venice (zero data retention) |
| text-embedding-3-small | embeddings | OpenAI |
Aliases & shortcuts: auto resolves to the flagship default. Suffixes set the routing objective in the model id itself: llama-3.3-70b:cheapest (cost-only) and llama-3.3-70b:fastest (latency-only); :floor and :nitro are accepted as synonyms. A fully-qualified provider/model id (e.g. groq/llama-3.3-70b) pins that provider.
Honest caveat: the “same” open model can differ across providers in quantization, context window, and tokenizer, so outputs and token counts are not byte-identical. x_current.selected_provider always tells you who served a request. Embedding ids are never aliased across different underlying models (vectors are only comparable within one model).
Chat completions
POST /v1/chat/completions, OpenAI-compatible. Accepts the standard fields (messages, temperature, max_tokens, tools, response_format, stream, …) plus the optional Current routing extension. Unknown fields are ignored where OpenAI tolerates them.
{
"model": "llama-3.3-70b",
"messages": [{"role": "user", "content": "Hello"}],
"routing": {
"cost": 0.6, "latency": 0.2, "uptime": 0.1, "liquidity": 0.05, "health": 0.05,
"providers": ["groq", "together"],
"provider": null,
"max_cost_per_mtok": 1.00
}
}The response is the standard OpenAI object with an added x_current:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1733500000,
"model": "llama-3.3-70b",
"choices": [
{ "index": 0, "message": { "role": "assistant", "content": "Hi!" }, "finish_reason": "stop" }
],
"usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 },
"x_current": {
"selected_provider": "groq",
"offer_id": 3,
"score": 0.60,
"score_breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
"provider_cost_per_mtok": 0.60,
"billed_per_mtok": 0.60,
"failover_count": 0,
"cache_hit": false,
"failover_order": ["groq", "fireworks", "together"],
"usage": { "estimated": false, "provider_cost_usd": 0.000014, "routing_fee_usd": 0.000001, "total_usd": 0.000015 },
"savings": { "vs_most_expensive_usd": 0.000006, "pct": 31.2 }
}
}Streaming
Set stream: true for standard OpenAI SSE: data: lines of chat.completion.chunk objects ending with data: [DONE]. The x_current routing decision and final usage arrive on the last chunk. Failover happens before the first byte. Once streaming starts there is no silent provider switch, and a mid-stream upstream failure is surfaced as an SSE error event.
curl https://api.currentinference.com/v1/chat/completions \
-H "Authorization: Bearer cur_live_..." \
-H "Content-Type: application/json" \
-d '{ "model": "llama-3.3-70b",
"messages": [{"role":"user","content":"Stream a haiku."}],
"stream": true }'
# Server-sent events:
# data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hi"}}]}
# data: {"id":"...","choices":[{"delta":{}}],"usage":{...},"x_current":{...}}
# data: [DONE]Embeddings
POST /v1/embeddings takes a single string or an array. Returns the OpenAI { object: "list", data: [{ embedding, index }], model, usage } shape, routed to a provider that serves the embedding model.
curl https://api.currentinference.com/v1/embeddings \
-H "Authorization: Bearer cur_live_..." \
-H "Content-Type: application/json" \
-d '{ "model": "text-embedding-3-small", "input": ["hello", "world"] }'Legacy completions
POST /v1/completions is the legacy text completion endpoint. It uses the same routing and billing as chat. Pass prompt instead of messages.
List models
GET /v1/models returns the routable models as the OpenAI list shape:
{ "object": "list", "data": [ { "id": "llama-3.3-70b", "object": "model", "owned_by": "current" } ] }The routing extension
Every inference call accepts an optional routing object to steer or override the decision per request. All fields are optional; omitted weights fall back to your account defaults (cost 0.40 · latency 0.20 · uptime 0.20 · liquidity 0.10 · health 0.10). Higher weight = that axis matters more; the highest total score wins.
| Field | Type | Meaning |
|---|---|---|
| cost | number | Weight on price (prefer cheaper offers). |
| latency | number | Weight on speed. |
| uptime | number | Weight on the offer’s rolling success rate. |
| liquidity | number | Weight on available capacity. |
| health | number | Weight on the offer reputation (0 to 100). |
| providers | string[] | Allowed set. Route only among these. |
| provider | string | null | Pin one provider (skips routing). |
| max_cost_per_mtok | number | Hard ceiling on billed price ($/Mtok). |
| cache | boolean | Reserved. Response caching is on the roadmap, so it’s ignored today (cache_hit is always false). |
Via the OpenAI SDK, send it through extra_body:
client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"routing": {"cost": 0.7, "latency": 0.2, "uptime": 0.1}},
)The x_current object
Every response carries x_current, the routing decision laid out in the open:
| Field | Meaning |
|---|---|
| selected_provider | Provider whose offer served the request. |
| offer_id | The winning offer’s id. |
| score | Winning (highest) routing score. |
| score_breakdown | Per-axis contribution (cost / latency / uptime / liquidity / health). |
| provider_cost_per_mtok | The offer’s blended price. |
| billed_per_mtok | What you pay, the offer’s price (no markup). |
| failover_count | How many providers were tried before success. |
| failover_order | The ranked candidate list considered. |
| usage | Exact money: provider_cost_usd + routing_fee_usd = total_usd, plus estimated, which is true when a provider omitted token counts and Current estimated them (tiktoken). |
| savings | What this request saved vs the most expensive eligible candidate (vs_most_expensive_usd, pct). |
| requested_model | Present when an alias (auto) or suffix (:cheapest) was resolved. |
| cache_hit | Whether a cached response was served (always false until caching ships). |
Every response also carries an X-Request-Id header (or echoes yours). Include it when contacting support so we can trace the exact request.
Routing preview
GET /v1/routing/preview?model=<id> returns the full ranked candidate list and score breakdown for a model without running inference. It’s the data behind the dashboard’s “why this provider?” view.
{
"model": "llama-3.3-70b",
"weights": { "cost": 0.4, "latency": 0.2, "uptime": 0.2, "liquidity": 0.1, "health": 0.1 },
"candidates": [
{ "provider": "groq", "offer_id": 3, "score": 0.60,
"breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
"cost_per_mtok": 0.60, "billed_per_mtok": 0.60, "latency_ms": 120, "uptime": 0.985, "health_score": 100 }
],
"selected": "groq"
}Errors
Errors use the OpenAI envelope, so the OpenAI SDK surfaces them natively. Branch on the stable code rather than the message. Current never returns an untyped 500.
{
"error": {
"message": "The model 'gpt-9' does not exist or is not routable.",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}| HTTP | code | When |
|---|---|---|
| 401 | invalid_api_key | Missing or bad key. |
| 402 | credit_exhausted | Prepaid balance is ≤ 0. Top up to continue. |
| 400 | model_not_found | Unknown / non-routable model. |
| 400 | unsupported_parameter | No provider supports a requested parameter. |
| 400 | context_length_exceeded | Prompt exceeds every provider’s context window. |
| 400 | cost_ceiling_exceeded | No provider within max_cost_per_mtok. |
| 404 | no_provider_available | No provider serves the model. |
| 404 | pinned_provider_unavailable | Pinned provider can’t serve the model. |
| 413 | payload_too_large | Request body over the size cap. |
| 402 | key_cap_reached | This key’s monthly spend cap is reached. |
| 404 | not_found | Unknown route/method (still the OpenAI envelope). |
| 429 | rate_limit_exceeded | Over your rate limit. See Retry-After. |
| 502 | upstream_error · upstream_unreachable | The selected provider failed after streaming began (pre-stream failures fail over automatically). |
| 503 | all_providers_down | Every candidate is currently failing. |
| 503 | no_provider_configured | No provider for this model has credentials on this deployment. |
Rate limits
Limits are enforced per API key on the inference path (plus per-IP throttles on auth endpoints). When you exceed a limit you receive 429 rate_limit_exceeded with a Retry-After header (seconds to wait); responses also carry X-RateLimit-Limit and X-RateLimit-Remaining. Back off and retry after the indicated delay.
Pricing & billing
You pay the winning offer’s price, the competitive floor, with no markup. Input and output tokens are billed at the offer’s separate in/out prices. The marketplace keeps a flat 5% fee out of the seller’s settlement (never added to your bill), and the seller is credited the remaining 95%. x_current.usage carries the exact micro-dollar breakdown per request, where total_usd (what you pay) = provider_cost_usd (seller settlement) + routing_fee_usd (marketplace fee). x_current.savings shows what you saved vs the priciest offer. Billing is prepaid: top up by card in the dashboard (USDC coming), spend is metered in micro-dollars, and the balance hard-stops at zero (402 credit_exhausted). New accounts get free credit to try it.
SDKs
You don’t need a Current SDK. The stock OpenAI SDKs work (see Quickstart). The official clients live in the repo (publication to PyPI/npm is in progress) and add first-class, typed access to the routing extension and x_current, retries with Retry-After handling, and streaming helpers.
Python
# not on PyPI yet, so install from the repo:
pip install "git+https://github.com/ekempinski/infera.git#subdirectory=sdk/python"from current import Current, CurrentError
client = Current(api_key="cur_live_...") # base_url defaults to https://api.currentinference.com
resp = client.chat_completions(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp["choices"][0]["message"]["content"])
print(resp["x_current"]["selected_provider"], resp["x_current"]["billed_per_mtok"])TypeScript
# not on npm yet, so vendor sdk/typescript from the repo for now;
# or simply use the stock OpenAI SDK (above), which works unchanged.import { Current } from "@current/sdk";
const current = new Current({ apiKey: process.env.CURRENT_API_KEY! });
const out = await current.chatCompletions({
model: "llama-3.3-70b",
messages: [{ role: "user", content: "Hello" }],
});
console.log(out.choices[0].message.content, out.x_current?.selected_provider);