Pricing
Every request GPROXY handles is priced at response time and the result is
stored in the usages.cost column. This page documents the pricing data
model, where the values live, and how admin edits propagate into the
billing engine.
The ModelPrice shape
Section titled “The ModelPrice shape”A single JSON blob — models.pricing_json — is the authoritative source of
pricing for a given (provider_id, model_id) row. It mirrors the
gproxy_sdk::provider::billing::ModelPrice struct defined in
sdk/gproxy-channel/src/billing.rs:
{ "price_each_call": 0.005, "price_tiers": [ { "input_tokens_up_to": 200000, "price_input_tokens": 3.0, "price_output_tokens": 15.0, "price_cache_read_input_tokens": 0.3, "price_cache_creation_input_tokens": 3.75, "price_cache_creation_input_tokens_5min": 3.75, "price_cache_creation_input_tokens_1h": 6.0 } ], "flex_price_tiers": [], "scale_price_tiers": [], "priority_price_tiers": []}Fields:
price_each_call— flat USD fee per request, regardless of tokens.price_tiers[]— per-token prices bucketed byinput_tokens_up_to. All token prices are per 1,000,000 tokens. The first tier whoseinput_tokens_up_tois >= the summed input-side token count (input + cache read + cache creation) is selected.flex_price_each_call/flex_price_tiers— override for OpenAIservice_tier: "flex".scale_price_each_call/scale_price_tiers— override forservice_tier: "scale".priority_price_each_call/priority_price_tiers— override for OpenAIservice_tier: "priority"and Anthropicspeed: "fast".
model_id and display_name live in their own columns on the models
table and are not stored inside the JSON blob; they are stamped back
onto the parsed ModelPrice at load time.
Where pricing lives
Section titled “Where pricing lives”- Built-in JSON — each channel ships a default price table at
sdk/gproxy-channel/src/channels/pricing/*.json. These are compiled into the binary viainclude_str!and seeded into the DB on first run of each provider. - DB (
models.pricing_json) — the authoritative source at runtime. Admin edits write here, bootstrap seeds from the built-in JSON only when a row is missing. - In-memory
MemoryModel.pricing— a parsedModelPricecloned from the DB into the routing service on boot and on every admin mutation. - Billing engine —
ProviderInstance.model_pricingis anArcSwap<Vec<ModelPrice>>owned by the SDK’sProviderStore. It is updated viaengine.set_model_pricing(provider_name, prices).
How admin edits reach billing
Section titled “How admin edits reach billing”When an admin upserts a model via the console or
POST /admin/models/upsert:
- Handler validates
pricing_jsonby parsing it intoModelPrice. Malformed JSON is rejected with400 Bad Requestbefore the DB write. storage.upsert_model(...)persists the row.state.upsert_model_in_memory(...)swaps the newMemoryModel.pricinginto the routing service.state.push_pricing_to_engine(provider_name)rebuilds the per-providerVec<ModelPrice>from the memory snapshot and callsengine.set_model_pricing(...).- The next billing call (whether from the same request or an unrelated
concurrent one) reads the new prices from the
ArcSwap.
There is no caching layer between admin write and billing read — the push is synchronous and the ArcSwap swap is lock-free. Last writer wins.
Billing mode selection
Section titled “Billing mode selection”BillingContext.mode is set from the request body:
| Channel | Signal in request body | Mode |
|---|---|---|
openai | service_tier: "flex" | Flex |
openai | service_tier: "scale" | Scale |
openai | service_tier: "priority" | Priority |
anthropic | speed: "fast" | Priority |
claudecode | speed: "fast" | Priority |
| anything else | — | Default |
When mode is not Default, the engine looks for a mode-specific tier
array (flex_price_tiers, etc.) on the exact model first, then on the
default model row. If none exists, it falls back to price_tiers.
Token pricing formula
Section titled “Token pricing formula”For the selected tier, each non-null price field contributes:
amount = tokens × unit_price ÷ 1_000_000Summed across input_tokens, output_tokens, cache_read_input_tokens,
cache_creation_input_tokens, cache_creation_input_tokens_5min, and
cache_creation_input_tokens_1h.
The tier is selected by effective_input_tokens(usage) which is
input + cache_read + cache_creation + cache_creation_5min + cache_creation_1h.
Price matching: exact → default fallback
Section titled “Price matching: exact → default fallback”Price lookup is strict string matching on model_id:
- Find an exact-match
ModelPricerow wheremodel_id == request_model_id. - If missing, fall back to a row with
model_id == "default". - If neither exists, billing returns
Noneandusages.costis0.0.
There is no regex, prefix, or glob matching. If you want a shared
tier across many models, define a default row in the pricing JSON and
let unspecified models fall back to it.
Legacy columns
Section titled “Legacy columns”The models table still has price_each_call and price_tiers_json
columns — they are remnants of the pre-v1 pricing shape. The runtime
does not read or write them; they are kept only so the one-shot
backfill_legacy_pricing_json helper can migrate rows from older
deployments on first boot. A later release will drop them via an
explicit ALTER TABLE.
Where to look when pricing is wrong
Section titled “Where to look when pricing is wrong”- Expected price not applied — check that
models.pricing_jsonis populated for the row. Rows withNULL pricing_jsonbill0.0. - Admin edit had no effect — check server logs for
push_pricing_to_engine: provider not registered in engine store— that warn means the admin mutation went to the DB but the engine’s provider store has no matching entry, usually because the provider was renamed after the model was created. - Wrong tier selected — the tier selector uses the sum of
input_tokens + cache_* tokens, notinput_tokensalone. A request with mostly cached prompt tokens can cross a tier boundary even though the billable input is small.