Add post-sort shuffle for accounts with identical (priority, loadRate,
lastUsedAt) to break deterministic ordering when concurrent requests
read the same scheduler snapshot. Applies to both Antigravity and
OpenAI scheduling paths, plus the sortAccountsByPriorityAndLastUsed
helper.
Keeps upstream CallCount/ModelLoadInfo scheduling intact; shuffle is
additive and only randomises within equivalent-rank groups.
- In handleSmartRetry, use the actual upstream retryDelay to set model
rate limit duration instead of always using the 30s default
- Return info.RetryDelay from shouldTriggerAntigravitySmartRetry when
shouldRateLimitModel=true, so callers know the actual delay
- Extract getDefaultRateLimitDuration() and resolveResetTime() helpers
to reduce duplication in handleUpstreamError 429 handling
- Improve debug logging with upstream_retry_delay and response body
ParseGatewayRequest only parsed Anthropic format (system/messages),
ignoring Gemini native format (systemInstruction/contents). This caused
GenerateSessionHash to produce identical hashes for all Gemini sessions.
Add protocol parameter to ParseGatewayRequest to branch between
Anthropic and Gemini parsing. Update GenerateSessionHash message
traversal to extract text from both formats.
Mix SessionContext (ClientIP, UserAgent, APIKeyID) into
GenerateSessionHash 3rd-level fallback to differentiate requests
from different users sending identical content.
Also switch hashContent from SHA256-truncated to XXHash64 for
better performance, and optimize Trie Lua script to match from
longest prefix first.
Previously, thoughtSignature cleanup only applied to Gemini CLI
requests (detected via x-gemini-api-privileged-user-id header or
tmp dir pattern). This caused 400 errors for non-CLI clients when
session cache expired and they sent stale signatures.
Remove the isGeminiCLIRequest guard so all clients benefit from
proactive thoughtSignature cleanup on session binding miss.
- Add `sort_order` field to groups table with migration
- Add `PUT /api/v1/admin/groups/sort-order` API for batch update
- Implement drag-and-drop UI using vue-draggable-plus
- All queries now order groups by sort_order
- Add i18n support (en/zh) for sort-related UI text
- Update test stubs to satisfy new interface methods
Upstream accounts now use the standard APIKey type instead of a dedicated
upstream type. GetBaseURL() and new GetGeminiBaseURL() automatically append
/antigravity for Antigravity platform APIKey accounts, eliminating the need
for separate upstream forwarding methods.
- Remove ForwardUpstream, ForwardUpstreamGemini, testUpstreamConnection
- Remove upstream branch guards in Forward/ForwardGemini/TestConnection
- Add migration 052 to convert existing upstream accounts to apikey
- Update frontend CreateAccountModal to create apikey type
- Add unit tests for GetBaseURL and GetGeminiBaseURL
Replace manual header setting (Content-Type, anthropic-version, anthropic-beta)
with full client header passthrough in ForwardUpstream/ForwardUpstreamGemini.
Only authentication headers (Authorization, x-api-key) are overridden with
upstream account credentials. Hop-by-hop headers are excluded.
Add unit tests covering header passthrough, auth override, and hop-by-hop filtering.
v0.1.74 merged upstream accounts into the OAuth path, causing requests
to hit the wrong protocol and endpoint. Add three upstream-specific
methods (testUpstreamConnection, ForwardUpstream, ForwardUpstreamGemini)
that use base_url + apiKey auth and passthrough the original body, while
reusing the existing response handling and error/retry logic.
- Change antigravitySmartRetryMaxAttempts from 3 to 1 to prevent
repeated rate limiting and long waits
- Clear sticky session binding (DeleteSessionAccountID) after smart
retry exhaustion, so subsequent requests don't hit the same
rate-limited account
- Add flow diagrams to Forward/ForwardGemini doc comments
- Add comprehensive unit tests covering:
- Sticky session cleared on retry failure (429, 503, network error)
- Sticky session NOT cleared on retry success
- Sticky session NOT cleared for non-sticky requests (empty hash)
- Sticky session NOT cleared on long delay path (handled by handler)
- Nil cache safety (no panic)
- MaxAttempts constant verification
- End-to-end retryLoop → switchError propagation with session clear
The digest chain fallback is only needed for Gemini endpoints, not
for the Anthropic Messages API path. Remove the handler integration
while keeping the reusable service/repository layer for future use.
The previous fallback (step 3) in GenerateSessionHash hashed system +
all messages together, producing a different hash each round as the
conversation grew ([a] -> [a,b] -> [a,b,c]). This made fallback sticky
sessions ineffective for multi-turn conversations.
Implement per-message Trie digest chain matching (reusing Gemini's Trie
infrastructure) so that the previous round's chain is always a prefix
of the current round's chain, enabling reliable session affinity.
Remove threshold-based waiting in both sticky session and antigravity
pre-check paths. When a model is rate-limited, immediately clear the
sticky session and switch accounts instead of waiting for short durations.
1. Frontend: replace hardcoded antigravityDefaultMappings with async
fetch from GET /admin/accounts/antigravity/default-model-mapping,
eliminating the duplicate data source that caused frontend/backend
mapping inconsistency.
2. Backend: convert handleSmartRetry and antigravityRetryLoop from
standalone functions to AntigravityGatewayService methods, enabling
Redis cache sync (updateAccountModelRateLimitInCache) after both
rate-limit write paths — long-delay branch and retry-exhausted branch.
- GetAccessToken: add upstream branch to read api_key from credentials
- shouldTriggerAntigravitySmartRetry: relax check from IsOAuth to Platform-based
- isModelSupportedByAccount/WithContext: replace IsAntigravityModelSupported
whitelist with mapAntigravityModel for unified scheduling/forwarding logic
- mapAntigravityModel: fix edge case where wildcard target equals request model
- Update tests for new behavior and add custom model_mapping test cases
The default fallback cooldown when rate limit reset time cannot be
parsed was 5 minutes, which is too aggressive and causes accounts
to be unnecessarily locked out. Reduce to 30 seconds for faster
recovery. Config override still works (unit remains minutes).
When extended thinking is enabled, Claude API requires max_tokens >
thinking.budget_tokens. If misconfigured, this auto-adjusts max_tokens
to budget_tokens + 1000 instead of returning a 400 error.
- Add ensureMaxTokensGreaterThanBudget helper function
- Extract Gemini25FlashThinkingBudgetLimit constant (24576)
- Log adjustment for debugging