Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:
1. temperature: previously deleted unconditionally; now passes
through client value, defaults to 1 when absent (real CLI
always sends temperature, default 1).
2. max_tokens: defaults to 128000 when absent (real CLI default).
3. context_management: when thinking.type is enabled/adaptive
and the client did not provide context_management, inject
{"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
to mirror real CLI behavior.
tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).
Tests updated:
- gateway_body_order_test.go: temperature/max_tokens are now
expected in output; tool_choice still removed.
- gateway_prompt_test.go: system array is now 2 blocks
(billing + cc prompt), assertions adjusted.
- gateway_anthropic_apikey_passthrough_test.go: same 2-block
assertion.
Cover IsValidModelSource/NormalizeModelSource, resolveModelDimensionExpression SQL expressions, invalid model_source 400 responses on both GetModelStats and GetUserBreakdown, upstream_model in scan/insert SQL mock expectations, and updated passthrough/billing test signatures.
PR #635 returned HTTP 200 with {"input_tokens": 0} when upstream doesn't
support count_tokens (404). This caused Claude Code CLI to trust the zero
value, believing context uses 0 tokens, so auto-compression never triggers.
Fix: return 404 with proper error body so CLI falls back to its local
tokenizer for accurate estimation. Return nil (not error) to avoid
polluting ops error metrics with expected 404s.
Affected paths:
- Passthrough APIKey accounts: upstream 404 now passed through as 404
- Antigravity accounts: same fix (was also returning fake 200)