PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
accounts, including real Claude Code CLI clients. This replaced the
client's long system prompt (~10K+ tokens with stable cache_control
breakpoints) with a short ~45 token [billing, CC prompt] pair, which
falls below Anthropic's 1024-token minimum cacheable prefix threshold.
The result: every request created a new cache but never hit an existing
one.
Fix: restore the Claude Code client detection gate so that real CC
clients bypass body-level mimicry (system rewrite, message cache
management, tool name obfuscation). Non-CC third-party clients
(opencode, etc.) continue to receive full mimicry.
Also harden the detection logic:
- Make UA regex case-insensitive (align with claude_code_validator.go)
- Validate metadata.user_id format via ParseMetadataUserID() instead of
just checking non-empty, preventing third-party tools from spoofing
a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:
1. temperature: previously deleted unconditionally; now passes
through client value, defaults to 1 when absent (real CLI
always sends temperature, default 1).
2. max_tokens: defaults to 128000 when absent (real CLI default).
3. context_management: when thinking.type is enabled/adaptive
and the client did not provide context_management, inject
{"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
to mirror real CLI behavior.
tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).
Tests updated:
- gateway_body_order_test.go: temperature/max_tokens are now
expected in output; tool_choice still removed.
- gateway_prompt_test.go: system array is now 2 blocks
(billing + cc prompt), assertions adjusted.
- gateway_anthropic_apikey_passthrough_test.go: same 2-block
assertion.