Mix SessionContext (ClientIP, UserAgent, APIKeyID) into
GenerateSessionHash 3rd-level fallback to differentiate requests
from different users sending identical content.
Also switch hashContent from SHA256-truncated to XXHash64 for
better performance, and optimize Trie Lua script to match from
longest prefix first.
The previous fallback (step 3) in GenerateSessionHash hashed system +
all messages together, producing a different hash each round as the
conversation grew ([a] -> [a,b] -> [a,b,c]). This made fallback sticky
sessions ineffective for multi-turn conversations.
Implement per-message Trie digest chain matching (reusing Gemini's Trie
infrastructure) so that the previous round's chain is always a prefix
of the current round's chain, enabling reliable session affinity.
Remove threshold-based waiting in both sticky session and antigravity
pre-check paths. When a model is rate-limited, immediately clear the
sticky session and switch accounts instead of waiting for short durations.
- GetAccessToken: add upstream branch to read api_key from credentials
- shouldTriggerAntigravitySmartRetry: relax check from IsOAuth to Platform-based
- isModelSupportedByAccount/WithContext: replace IsAntigravityModelSupported
whitelist with mapAntigravityModel for unified scheduling/forwarding logic
- mapAntigravityModel: fix edge case where wildcard target equals request model
- Update tests for new behavior and add custom model_mapping test cases
Kimi 等 Claude 兼容 API 返回缓存信息使用 OpenAI 风格的 cached_tokens 字段,
而非 Claude 标准的 cache_read_input_tokens,导致客户端收不到缓存命中信息且
内部计费缓存折扣为 0。
新增 reconcileCachedTokens 辅助函数,在 cache_read_input_tokens == 0 且
cached_tokens > 0 时自动填充,覆盖流式(message_start/message_delta)和
非流式两种响应路径。对 Claude 原生上游无影响。
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kimi 等 Claude 兼容 API 返回缓存信息使用 OpenAI 风格的 cached_tokens 字段,
而非 Claude 标准的 cache_read_input_tokens,导致客户端收不到缓存命中信息且
内部计费缓存折扣为 0。
新增 reconcileCachedTokens 辅助函数,在 cache_read_input_tokens == 0 且
cached_tokens > 0 时自动填充,覆盖流式(message_start/message_delta)和
非流式两种响应路径。对 Claude 原生上游无影响。
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This feature allows API Keys to have their own quota limits and expiration
times, independent of the user's balance.
Backend:
- Add quota, quota_used, expires_at fields to api_key schema
- Implement IsExpired() and IsQuotaExhausted() checks in middleware
- Add ResetQuota and ClearExpiration API endpoints
- Integrate quota billing in gateway handlers (OpenAI, Anthropic, Gemini)
- Include quota/expiration fields in auth cache for performance
- Expiration check returns 403, quota exhausted returns 429
Frontend:
- Add quota and expiration inputs to key create/edit dialog
- Add quick-select buttons for expiration (+7, +30, +90 days)
- Add reset quota confirmation dialog
- Add expires_at column to keys list
- Add i18n translations for new features (en/zh)
Migration:
- Add 045_add_api_key_quota.sql for new columns