Credential Selection and Cache Affinity
Why this page exists
Section titled “Why this page exists”When a provider has multiple credentials, cache hit rate and credential selection are coupled:
- If cache-sensitive requests keep switching credentials, upstream cache hit rate usually drops.
- If everything is pinned to one credential, throughput and failover degrade.
GPROXY balances this with pick modes plus an internal in-memory cache-affinity pool.
Pick mode configuration
Section titled “Pick mode configuration”Configure these fields in channels.settings:
credential_round_robin_enabled(defaulttrue)credential_cache_affinity_enabled(defaulttrue)
Effective modes:
credential_round_robin_enabled | credential_cache_affinity_enabled | Effective mode | Behavior |
|---|---|---|---|
false | false/true | StickyNoCache | no round-robin, no affinity pool, always pick the smallest available credential id |
true | true | RoundRobinWithCache | round-robin among eligible credentials with affinity matching |
true | false | RoundRobinNoCache | round-robin among eligible credentials, no affinity matching |
Notes:
StickyWithCacheis intentionally not supported.- If round-robin is disabled, affinity is forced off.
- Legacy fields
credential_pick_modeandcache_affinity_enabledare still parsed for compatibility.
Internal affinity pool design (v1)
Section titled “Internal affinity pool design (v1)”GPROXY keeps a process-local map:
- key:
"{channel}::{affinity_key}" - value:
{ credential_id, expires_at } - store:
DashMap<String, CacheAffinityRecord>
This is still the v1 pool format (no v2 namespace, no storage schema change).
Hit judgment and retry behavior
Section titled “Hit judgment and retry behavior”RoundRobinWithCache uses a multi-candidate hint:
CacheAffinityHint { candidates, bind }- each candidate has
{ key, ttl_ms }
Selection flow:
- Build candidate keys from request body with protocol-specific block/prefix rules.
- Match candidates in priority order (longest prefix first).
- If a non-expired mapping exists and credential is currently eligible, pick it.
- Otherwise, use normal round-robin among eligible credentials.
- On success, always bind the
bindkey. - If a candidate key was matched, refresh that matched key TTL too.
- If an affinity-picked attempt fails and retries, only clear the matched key for that attempt.
Key derivation and TTL rules by protocol
Section titled “Key derivation and TTL rules by protocol”GPROXY no longer uses whole-body hash for these content-generation requests. It uses canonicalized block prefixes.
Shared rules:
- Canonical JSON per block: sorted object keys,
nullremoved, arrays keep order. - Rolling prefix hash:
prefix_i = sha256(seed + block_1 + ... + block_i). - Non-Claude candidate sampling:
<=64boundaries: all>64: first 8 and last 56- match priority: longest prefix first
streamdoes not participate in key derivation.
OpenAI Chat Completions
Section titled “OpenAI Chat Completions”Block order:
tools[]response_format.json_schemamessages[](split by content blocks)
Key format:
openai.chat:ret={ret}:k={prompt_cache_key_hash|none}:h={prefix_hash}
TTL:
prompt_cache_retention == "24h"-> 24h- otherwise -> 5m
OpenAI Responses
Section titled “OpenAI Responses”Block order:
tools[]prompt(id/version/variables)instructionsinput(split item/content blocks)
Key format:
openai.responses:ret={ret}:k={prompt_cache_key_hash|none}:h={prefix_hash}
TTL:
prompt_cache_retention == "24h"-> 24h- otherwise -> 5m
Not included in prefix key:
reasoningmax_output_tokensstream
Claude Messages
Section titled “Claude Messages”Block hierarchy:
tools[] -> system -> messages.content[]
Breakpoints:
- explicit: block has
cache_control - automatic: top-level
cache_controlexists, then use the last cacheable block (fallback backward if needed)
Candidates:
- for each breakpoint, include up to 20 lookback boundaries
- merge and dedupe candidates
- priority: later breakpoint first, then longer prefix first
Key format:
claude.messages:ttl={5m|1h}:bp={explicit|auto}:h={prefix_hash}
TTL:
- breakpoint
ttl == "1h"-> 1h - auto top-level
cache_control: {"type":"ephemeral"}(no ttl) -> 1h - otherwise -> 5m
If request has no explicit breakpoint and no top-level cache_control, affinity hint is not generated.
Gemini GenerateContent / StreamGenerateContent
Section titled “Gemini GenerateContent / StreamGenerateContent”If cachedContent exists:
- key:
gemini.cachedContent:{sha256(cachedContent)} - TTL: 60m
Otherwise prefix mode:
- block order:
systemInstruction -> tools[] -> toolConfig -> contents[].parts[] - key:
gemini.generateContent:prefix:{prefix_hash} - TTL: 5m
Not included by default:
generationConfigsafetySettings
Claude and ClaudeCode top-level cache control
Section titled “Claude and ClaudeCode top-level cache control”When enable_top_level_cache_control is enabled and request has no top-level cache_control, GPROXY injects:
{"type":"ephemeral"}This applies to Claude and ClaudeCode message generation requests. Anthropic side decides the effective TTL for this automatic mode.
Upstream cache mechanisms (provider-side)
Section titled “Upstream cache mechanisms (provider-side)”These are provider behaviors, independent from GPROXY affinity internals.
OpenAI
Section titled “OpenAI”- Prompt caching is prefix-oriented.
- Stable system/tools/prompt prefix and stable model improve hit rate.
prompt_cache_keyand retention policy affect cache affinity and upstream behavior.
Claude
Section titled “Claude”- Supports explicit block-level breakpoints and automatic top-level caching.
- Cache hit is prefix/breakpoint driven and sensitive to block ordering and cacheable boundaries.
Gemini
Section titled “Gemini”- Context caching is centered around
cachedContentreuse. - Reusing the same cached content handle typically improves hit rate.
- GPROXY currently supports generation routes and does not expose cached-content management routes.
Practical tips
Section titled “Practical tips”- Keep prefix content byte-stable (model, tools, system, long context ordering).
- Use
RoundRobinWithCachefor cache-sensitive traffic. - Avoid unnecessary credential churn inside short cache windows.
- Split very different prompt workloads into different channels/providers.
- Enable top-level cache control only when you want automatic Claude/ClaudeCode cache behavior.
- Prefer explicit
cachedContentreuse in Gemini workflows when available.
Usage examples
Section titled “Usage examples”Round-robin + cache affinity:
[channels.settings]credential_round_robin_enabled = truecredential_cache_affinity_enabled = trueRound-robin without affinity:
[channels.settings]credential_round_robin_enabled = truecredential_cache_affinity_enabled = falseNo round-robin (sticky smallest-id available credential):
[channels.settings]credential_round_robin_enabled = false