Consolidates 9 call sites of resolveUpstreamResponseReadLimit + readUpstreamResponseBodyLimited + ErrUpstreamResponseBodyTooLarge error handling into a single ReadUpstreamResponseBody function with TooLargeWriter callback for API-format-specific error responses (Anthropic, OpenAI, countTokens).
- Parse candidatesTokensDetails from Gemini API to separate image/text output tokens
- Add image_output_tokens and image_output_cost to usage_log (migration 089)
- Support per-image-token pricing via output_cost_per_image_token from model pricing data
- Channel pricing ImageOutputPrice override works in token billing mode
- Auto-fill image_output_price in channel pricing form from model defaults
- Add "channel_mapped" billing model source as new default (migration 088)
- Bills by model name after channel mapping, before account mapping
- Fix channel cache error TTL sign error (115s → 5s)
- Fix Update channel only invalidating new groups, not removed groups
- Fix frontend model_mapping clearing sending undefined instead of {}
- Credits balance precheck via shared AccountUsageService cache before injection
- Skip credits injection for accounts with insufficient balance
- Don't mark credits exhausted for "exhausted your capacity on this model" 429s
Gemini 2.5 Pro/Flash thinking models return thoughtsTokenCount separately
from candidatesTokenCount in usageMetadata, but this field was not parsed
or included in billing calculations, causing thinking tokens to be
unbilled.
- Add ThoughtsTokenCount field to GeminiUsageMetadata struct
- Include thoughtsTokenCount in OutputTokens across all 3 Gemini usage
parsing paths (non-streaming, streaming, compat layer)
- Add tests covering thinking token scenarios
Closes#554
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.
- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
Add ShouldHandleErrorCode guard at the entry of handleGeminiUpstreamError
and AntigravityGatewayService.handleUpstreamError so that accounts with
custom error codes (e.g. [599]) are not rate-limited when the upstream
returns a non-matching status (e.g. 429).
When custom error codes are enabled and the upstream error code is NOT
in the configured list, return HTTP 500 to the client instead of
transparently forwarding the original status code.
Also adds integration test TestCustomErrorCode599 verifying that 429,
500, 503, 401, 403 all return 500 without triggering SetRateLimited
or SetError.
Upstream accounts now use the standard APIKey type instead of a dedicated
upstream type. GetBaseURL() and new GetGeminiBaseURL() automatically append
/antigravity for Antigravity platform APIKey accounts, eliminating the need
for separate upstream forwarding methods.
- Remove ForwardUpstream, ForwardUpstreamGemini, testUpstreamConnection
- Remove upstream branch guards in Forward/ForwardGemini/TestConnection
- Add migration 052 to convert existing upstream accounts to apikey
- Update frontend CreateAccountModal to create apikey type
- Add unit tests for GetBaseURL and GetGeminiBaseURL
Previously, collectGeminiSSE() only returned the last chunk received
from the upstream streaming response when converting to non-streaming.
This caused incomplete responses where only the final text fragment
was returned to clients.
For example, a request asking to "count from 1 to 10" would only
return "\n" (the last chunk) instead of "1\n2\n3\n...\n10\n".
This was especially problematic for JSON structured output where
the opening brace "{" from the first chunk was lost, resulting
in invalid JSON like: colors": ["red", "blue"]}
The fix:
- Collect all text parts from each SSE chunk into a slice
- Merge all collected text parts into the final response
- Reuse the same pattern as handleGeminiStreamToNonStreaming
in antigravity_gateway_service.go
Fixes: non-streaming responses returning incomplete text
Fixes: structured output (JSON schema) returning invalid JSON
- Add claude_code_only field to restrict groups to Claude Code clients only
- Add fallback_group_id for non-Claude Code requests to use alternate group
- Implement ClaudeCodeValidator for User-Agent detection
- Add group-level session binding isolation (groupID in Redis key)
- Prevent cross-group sticky session pollution
- Update frontend with Claude Code restriction controls