Pricing
GPROXY v2 pricing is per provider model. The authoritative configuration is
provider_models.pricing_json; there is no separate price table.
Pricing and quotas are related but separate:
- pricing describes how much one provider model costs;
- quotas describe how much an org, team, or user is allowed to spend.
An unpriced model still runs. Missing, null, or malformed pricing fields parse
as zero, so usage is recorded but cost is 0.
pricing_json shape
Section titled “pricing_json shape”Token prices are per 1,000,000 tokens. String decimals are preferred because money is handled with decimal arithmetic, but JSON numbers are accepted.
{ "input": "3.00", "output": "15.00", "cache_read": "0.30", "cache_creation": "3.75"}Supported keys:
| Key | Meaning |
|---|---|
input | Per-million input token price. |
output | Per-million output token price. |
cache_read | Per-million cache-read token price. |
cache_creation | Per-million cache-creation token price. |
image | Either a flat per-image price or a tier object for image operations. |
The token cost formula is:
cost = input_tokens * input / 1_000_000+ output_tokens * output / 1_000_000+ cache_read_tokens * cache_read / 1_000_000+ cache_creation_tokens * cache_creation / 1_000_000Image pricing
Section titled “Image pricing”For image operations, image can be a scalar per-image price:
{ "image": "0.04" }It can also be a tier object. Lookup order is:
"{size}/{quality}";"{size}";"default";- zero if no tier matches.
{ "image": { "1024x1024": "0.04", "1792x1024/hd": "0.12", "default": "0.02" }}Image pricing is per generated image, not per million tokens.
Runtime lookup
Section titled “Runtime lookup”The control-plane snapshot caches provider models by provider id. During
admission and settlement, GPROXY resolves pricing by exact
(provider_id, upstream_model_id) lookup in that snapshot and parses the
model’s pricing_json.
There is no glob, prefix, or "default" model fallback in the current v2
pricing lookup. Configure pricing on each provider model row that should bill
non-zero cost.
Admission estimates
Section titled “Admission estimates”Before an upstream request is sent, quota admission uses a best-effort estimate:
- estimated input tokens are the request body length used by the current pending-cost estimator;
- output, cache, and image components are not estimated;
- the estimate is priced with the selected provider model’s token pricing;
- if the estimate is zero, pending quota pre-deduct is skipped.
For quota-bearing scopes, GPROXY adds the estimated micro-dollar cost to cache
keys named like qp:{scope}:{id}. These pending counters have a 15-minute TTL
so a crash between charge and refund self-heals.
Settlement
Section titled “Settlement”Successful content-generation responses settle exactly once:
- non-streaming and fully buffered responses settle inline;
- native streaming responses attach a guard so normal end, upstream interruption, or client drop all settle once;
- if upstream usage is present in the response, it is used;
- otherwise GPROXY falls back to local counting where the compiled feature set supports it.
The settled request writes a usages row with token counts, source, end state,
latency, route/provider/user dimensions, and cost. Quota reconciliation then:
- refunds the exact pending micro-dollar estimate;
- atomically increments
quotas.cost_usedfor each quota-bearing scope by the actual settled cost.
Embedding and image operations have their own provider-shaped settlement path. Model list/get, token-count, compact, and conversation operations are not currently billed by the content-generation settlement path.
Where operators edit prices
Section titled “Where operators edit prices”Use the console or the provider-model admin endpoint:
GET /admin/providers/{provider_id}/modelsPOST /admin/providers/{provider_id}/modelsJSON import/export uses the same provider_models input shape:
{ "id": 1, "provider_id": 1, "model_id": "gpt-4.1-mini", "display_name": "GPT-4.1 mini", "pricing_json": { "input": "0.40", "output": "1.60" }, "variants_json": null, "enabled": true}After admin mutations, GPROXY invalidates the control-plane snapshot so new requests see the updated model and pricing rows.