Skip to content

Claude Prompt Caching

Claude prompt caching is represented in v2 as request processing on provider-native Claude Messages bodies. The current runtime supports cache_breakpoint rules and channel request shaping for Claude-compatible traffic.

Cache rules run after protocol transform. A Gemini or OpenAI client routed to a Claude target can still receive Claude cache markers because the body has already been transformed to claude_messages.

A cache_breakpoint rule config looks like:

{
"target": "system",
"index": 1,
"ttl": "5m"
}

Supported fields:

FieldMeaning
targettop_level, system, tools, or last_message.
indexConsole-facing signed index: >0 means Nth from the start, <0 means Nth from the end, and 0 is invalid. If omitted, the runtime uses the last block. Ignored for top_level.
ttlOptional TTL string such as 5m or 1h.
positionReserved for compatibility; currently unused.

The rule only applies when the target operation kind is claude_messages. Non-Claude targets warn and skip.

TargetWhere the marker is inserted
top_levelOn the request root (global), enabling Anthropic’s automatic prompt caching. index/position are ignored.
systemAn item in a Claude system array. A string-form system cannot carry block metadata and is skipped.
toolsAn item in the top-level tools array.
last_messageA content block in the last message’s content array.

The runtime writes:

{ "cache_control": { "type": "ephemeral" } }

If ttl is set, it is added to that object.

Anthropic’s one-hour cache TTL requires a beta header. In v2, add that with a header rule attached to the same provider:

{
"name": "anthropic-beta",
"value": "extended-cache-ttl-2025-04-11",
"mode": "merge"
}

Use merge so existing client beta tokens are preserved.

The process layer applies:

system_text -> cache_breakpoint -> rewrite -> transform -> header

That means server-managed system text is inserted before cache breakpoints are placed. Later rewrite or transform rules can still change cached content, so be careful when combining prompt rewriting with prompt caching.

The current cache_breakpoint kind is specialized. The planned generic transform model would express it as a structural locator plus a merge action, with provider-specific path selection generated by console presets. The backend should remain permissive: it should apply the configured mutation and let the upstream enforce provider policy such as breakpoint limits.