Commit Graph

231 Commits

Author SHA1 Message Date
QTom
c1c31ed9b2 feat: wire RPMCache into GatewayService and AccountHandler 2026-02-28 20:35:38 +08:00
yangjianbo
bb664d9bbf feat(sync): full code sync from release 2026-02-28 15:01:20 +08:00
erio
a6f9f9f968 feat: replace gemini-3-pro-image with gemini-3.1-flash-image
- Add migration 060 to update model_mapping for all antigravity accounts
- Remove gemini-3-pro-image and gemini-3-pro-image-preview mappings
- Add gemini-3.1-flash-image and gemini-3.1-flash-image-preview mappings
- Update frontend usage window to show GImage for new model
- Update isImageGenerationModel to support new model
2026-02-27 09:52:50 +08:00
alfadb
e6969acb50 fix: address review - fix log wording and add response body assertion in test 2026-02-26 23:49:30 +08:00
alfadb
9489531431 fix(gateway): return 404 instead of fake 200 for unsupported count_tokens endpoint
PR #635 returned HTTP 200 with {"input_tokens": 0} when upstream doesn't
support count_tokens (404). This caused Claude Code CLI to trust the zero
value, believing context uses 0 tokens, so auto-compression never triggers.

Fix: return 404 with proper error body so CLI falls back to its local
tokenizer for accurate estimation. Return nil (not error) to avoid
polluting ops error metrics with expected 404s.

Affected paths:
- Passthrough APIKey accounts: upstream 404 now passed through as 404
- Antigravity accounts: same fix (was also returning fake 200)
2026-02-26 23:34:53 +08:00
shaw
4ac57b4edf fix: 临时移除fast-mode-2026-02-01避免429问题 2026-02-26 15:44:28 +08:00
alfadb
03bcd94ae5 fix: count_tokens 端点不支持时降级返回空值 (404 only)
第三方 Anthropic 中转站通常不支持 /v1/messages/count_tokens 端点,
上游返回 404 时降级返回 {input_tokens: 0},客户端 fallback 到本地估算。

- 仅匹配 404 状态码,语义明确:端点不存在
- 其他错误 (400/429/500) 保留原始处理链和 ops 遥测
- 无需解析错误消息内容,不依赖字符串匹配
- 新增 table-driven 测试覆盖 fallback 和 non-fallback 路径
2026-02-26 09:28:45 +08:00
erio
644058174e fix(gemini): enable model_mapping filtering for Gemini API Key accounts
Remove the special case that bypassed model-supported checks for Gemini
API Key accounts, allowing model_mapping to filter requests properly.
Add tests for multiplatform model filtering behavior.
2026-02-24 18:54:59 +08:00
yangjianbo
2ee6c26676 fix(gateway): 修复粘性会话预取分组错配并优化并发等待热路径 2026-02-22 16:43:33 +08:00
yangjianbo
a89477ddf5 perf(gateway): 优化热点路径并补齐高覆盖测试 2026-02-22 13:31:30 +08:00
yangjianbo
1985be26b2 fix(gateway): 恢复 Anthropic 透传流数据间隔超时保护并补充回归测试 2026-02-21 16:54:44 +08:00
yangjianbo
bde9dbc57a feat(anthropic): 支持 API Key 自动透传并优化透传链路性能
- 新增 Anthropic API Key 自动透传开关与后端透传分支(仅替换认证)

- 账号编辑页新增自动透传开关,默认关闭

- 优化透传性能:SSE usage 解析 gjson 快路径、减少请求体重复拷贝、优化流式写回与非流式 usage 解析

- 补充单元测试与 benchmark,确保 Claude OAuth 路径不受影响
2026-02-21 14:16:18 +08:00
yangjianbo
46d9aee6dd feat(proxy,sora): 增强代理质量检测与Sora稳定性并修复审查问题 2026-02-19 21:18:35 +08:00
yangjianbo
440b87094a fix(sora): 增强 Cloudflare 挑战识别并收敛 Sora 请求链路
- 在 failover 场景透传上游响应头并识别 Cloudflare challenge/cf-ray

- 统一 Sora 任务请求的 UA 与代理使用,sentinel 与业务请求保持一致

- 修复流式错误事件 JSON 转义问题并补充相关单元测试
2026-02-19 15:09:58 +08:00
yangjianbo
5d9667d27a Merge branch 'main' into test
# Conflicts:
#	backend/cmd/server/VERSION
#	backend/ent/migrate/schema.go
#	backend/ent/mutation.go
#	backend/ent/runtime/runtime.go
#	backend/ent/usagelog.go
#	backend/ent/usagelog/usagelog.go
#	backend/ent/usagelog/where.go
#	backend/ent/usagelog_create.go
#	backend/ent/usagelog_update.go
#	backend/internal/repository/usage_log_repo.go
#	backend/internal/server/api_contract_test.go
#	backend/internal/server/middleware/cors.go
#	backend/internal/service/gateway_service.go
2026-02-18 20:16:31 +08:00
yangjianbo
fad04ca995 Merge branch 'main' of https://github.com/mt21625457/aicodex2api 2026-02-18 20:10:32 +08:00
shaw
074bd0dfda fix: 临时移除context-1m-2025-08-07以确保避免sonnet1m触发429 2026-02-18 18:41:30 +08:00
John Doe
3d1f03c286 feat: add Cache TTL Override per account + bump VERSION to 0.1.83
- Account-level cache TTL override: rewrite Anthropic cache_creation
  token classification (5m↔1h) in streaming/non-streaming responses
- New DB field cache_ttl_overridden in usage_log for billing tracking
- Migration 055_add_cache_ttl_overridden
- Frontend: CacheTTL override toggle in account create/edit modals
- Ent schema regenerated for new usage_log fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:19:24 +03:00
yangjianbo
6577f2ef03 fix(gateway): 避免SSE delta将缓存创建明细重置为0
- 仅在 delta 中 5m/1h 值大于0时覆盖 usage 明细
- 新增回归测试覆盖 delta 默认 0 不应覆盖 message_start 非零值
- 迁移 054 在删除 legacy 字段前追加一次回填,避免升级实例丢失历史写入

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:23:12 +08:00
yangjianbo
41d0383fb7 merge(test): 合并 main 并解决前端筛选器冲突 2026-02-15 22:04:06 +08:00
程序猿MT
1cf51b14f7 Merge branch 'Wei-Shaw:main' into main 2026-02-15 20:49:14 +08:00
shaw
a817cafe3d feat: 区分 Anthropic 5m/1h 缓存创建 token 的差异化计费
Anthropic API 的 cache_creation 对象区分了 ephemeral_5m 和 ephemeral_1h
两种缓存创建 token,1h 单价远高于 5m(如 claude-3-5-haiku: 5m=$1/MTok,
1h=$6/MTok)。此前系统统一按 5m 单价计费,导致计费偏低。

后端:
- pricing_service: 加载 LiteLLM 的 cache_creation_input_token_cost_above_1hr
- billing_service: GetModelPricing 启用分类计费(安全守卫 1h>5m),
  CalculateCost 按 5m/1h 分别计费,无明细时回退到 5m 单价
- gateway_service: parseSSEUsage/handleNonStreamingResponse 用 gjson
  提取嵌套 cache_creation 对象的 ephemeral_5m/1h_input_tokens
- antigravity_gateway_service: extractSSEUsage/extractClaudeUsage 同步提取
- usage_log: 修复 GORM column tag 确保写入正确的数据库列
- 新增迁移 054: 删除 GORM 自动生成的重复列

前端:
- 使用记录 tooltip 展示 5m/1h 缓存创建明细(带彩色 badge 区分)
- 表格单元格缓存写入数值旁显示 1h 标识
2026-02-14 18:15:35 +08:00
yangjianbo
d04b47b3ca feat(backend): 提交后端审计修复与配套测试改动 2026-02-14 11:23:10 +08:00
yangjianbo
abf5de69fb Merge branch 'main' into test 2026-02-12 23:43:47 +08:00
程序猿MT
174d7c774d Merge branch 'Wei-Shaw:main' into main 2026-02-12 23:12:41 +08:00
yangjianbo
584cfc3db2 chore(logging): 完成后端日志审计与结构化迁移
- 将高密度服务与处理器日志迁移到新日志系统(LegacyPrintf/结构化日志)
- 增加 stdlog bridge 与兼容测试,保留旧日志捕获能力
- 将 OpenAI 断流告警改为结构化 Warn 并改造对应测试为 sink 捕获
- 补齐后端相关文件 logger 引用并通过全量 go test
2026-02-12 19:01:09 +08:00
程序猿MT
8da5fac69e Merge branch 'Wei-Shaw:main' into main 2026-02-11 18:39:52 +08:00
SilentFlower
19cca11e00 [UPDATE] 增强 Claude Thinking 模式支持与 Opus 4.6 动态预算适配
 feat(antigravity): 支持 thinking adaptive 类型并适配 Opus 4.6 动态预算
🧪 test(gateway): 增加 thinking 模式解析与签名块过滤的边界用例测试
2026-02-11 10:31:16 +08:00
Edric Li
378e476e48 fix: 修复 CI 检查失败
- gofmt: 修复 error_passthrough_service.go 格式问题
- errcheck: 修复 error_passthrough_runtime_test.go 类型断言未检查
- staticcheck: if-else 改为 switch (gateway_service.go)
- test: 修复两个测试用例错误使用 MODEL_CAPACITY_EXHAUSTED 导致走错路径
2026-02-10 22:08:49 +08:00
yangjianbo
3b0910f664 Merge branch 'main' into test-sora 2026-02-10 18:01:17 +08:00
程序猿MT
1dd3158c7e Merge branch 'Wei-Shaw:main' into main 2026-02-10 13:55:51 +08:00
Edric Li
7d0a30fa8f merge: sync upstream main (antigravity single-account 503 retry)
合并上游新增的 Antigravity 单账号 503 退避重试机制,
解决与本地 MODEL_CAPACITY_EXHAUSTED 逻辑的冲突,两者共存。
2026-02-10 12:00:21 +08:00
shaw
5dd83d3cf2 fix: 移除特定system以适配新版cc客户端缓存失效的bug 2026-02-10 10:28:34 +08:00
Edric Li
d6c2921f2b feat: same-account retry before failover for transient errors
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.

- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
  all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
2026-02-10 00:53:54 +08:00
yangjianbo
2bfb16291f fix(unit): 修复 unit tag 测试编译与账号选择用例 2026-02-09 21:35:41 +08:00
yangjianbo
d367d1cde6 Merge branch 'main' into test-sora 2026-02-09 20:40:09 +08:00
yangjianbo
16131c3d3f Merge branch 'main' of https://github.com/mt21625457/aicodex2api 2026-02-09 20:26:03 +08:00
Rose Ding
021abfca18 fix: 单账号分组首次 503 不设模型限流标记,避免后续请求雪崩
单账号 antigravity 分组收到 503 (MODEL_CAPACITY_EXHAUSTED) 时,
原逻辑会设置 ~29s 模型限流标记。由于只有一个账号无法切换,
后续所有新请求在预检查时命中限流 → 几毫秒内直接返回 503,
导致约 30 秒的雪崩窗口。

修复:在 Handler 入口处检查分组是否只有单个 antigravity 账号,
如果是则提前设置 SingleAccountRetry context 标记,让 Service 层
首次 503 就走原地重试逻辑(不设限流标记),避免污染后续请求。
2026-02-09 17:25:36 +08:00
erio
fc095bf054 refactor: replace scope-level rate limiting with model-level rate limiting
Merge functional changes from develop branch:
- Remove AntigravityQuotaScope system (claude/gemini_text/gemini_image)
- Replace with per-model rate limiting using resolveAntigravityModelKey
- Remove model load statistics (IncrModelCallCount/GetModelLoadBatch)
- Simplify account selection to unified priority→load→LRU algorithm
- Remove SetAntigravityQuotaScopeLimit from AccountRepository
- Clean up scope-related UI indicators and API fields
2026-02-09 08:19:01 +08:00
erio
1af06aed96 feat: shuffle accounts within same sort group to prevent thundering herd
Add post-sort shuffle for accounts with identical (priority, loadRate,
lastUsedAt) to break deterministic ordering when concurrent requests
read the same scheduler snapshot. Applies to both Antigravity and
OpenAI scheduling paths, plus the sortAccountsByPriorityAndLastUsed
helper.

Keeps upstream CallCount/ModelLoadInfo scheduling intact; shuffle is
additive and only randomises within equivalent-rank groups.
2026-02-09 07:33:17 +08:00
erio
b889d5017b refactor: replace Trie-based digest session store with flat cache 2026-02-09 07:02:12 +08:00
erio
35598d5648 fix: parse Gemini native request format in ParseGatewayRequest for correct session hash generation
ParseGatewayRequest only parsed Anthropic format (system/messages),
ignoring Gemini native format (systemInstruction/contents). This caused
GenerateSessionHash to produce identical hashes for all Gemini sessions.

Add protocol parameter to ParseGatewayRequest to branch between
Anthropic and Gemini parsing. Update GenerateSessionHash message
traversal to extract text from both formats.
2026-02-09 06:47:22 +08:00
erio
5c76b9e45a fix: prevent sessionHash collision for different users with same messages
Mix SessionContext (ClientIP, UserAgent, APIKeyID) into
GenerateSessionHash 3rd-level fallback to differentiate requests
from different users sending identical content.

Also switch hashContent from SHA256-truncated to XXHash64 for
better performance, and optimize Trie Lua script to match from
longest prefix first.
2026-02-09 06:46:32 +08:00
yangjianbo
fd43be8d0b merge: 合并 main 分支到 test,解决 config 和 modelWhitelist 冲突
- config.go: 保留 Sora 配置,合入 SubscriptionCache 配置
- useModelWhitelist.ts: 同时保留 soraModels 和 antigravityModels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 20:18:07 +08:00
yangjianbo
a14dfb769a Merge branch 'dev-release' 2026-02-07 19:58:00 +08:00
erio
e3748da860 fix(lint): handle errcheck for strings.Builder.WriteString 2026-02-07 18:18:15 +08:00
erio
50a783ff01 feat: add Anthropic sticky session digest chain matching via Trie
The previous fallback (step 3) in GenerateSessionHash hashed system +
all messages together, producing a different hash each round as the
conversation grew ([a] -> [a,b] -> [a,b,c]). This made fallback sticky
sessions ineffective for multi-turn conversations.

Implement per-message Trie digest chain matching (reusing Gemini's Trie
infrastructure) so that the previous round's chain is always a prefix
of the current round's chain, enabling reliable session affinity.
2026-02-07 18:00:56 +08:00
yangjianbo
8226a4ce4d perf(service): 优化 model 替换函数,用 gjson/sjson 替代全量 JSON 序列化
SSE 热路径中 replaceModelInSSELine 和 replaceModelInResponseBody 原来
使用 json.Unmarshal/Marshal 对每个事件做全量反序列化再序列化,现改为
gjson.Get/sjson.Set 精确字段操作,消除 O(n) 中间 map 分配,保持 JSON
字段顺序不变。涉及 OpenAIGatewayService 和 GatewayService 两个服务。

新增 23 个单元测试覆盖:顶层/嵌套 model 替换、不匹配跳过、空行/[DONE]/
非法 JSON 等边界情况。

Fixes: P1-08

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 17:09:55 +08:00
erio
e1a68497d6 refactor: simplify sticky session rate limit handling — switch immediately on any rate limit
Remove threshold-based waiting in both sticky session and antigravity
pre-check paths. When a model is rate-limited, immediately clear the
sticky session and switch accounts instead of waiting for short durations.
2026-02-07 17:06:49 +08:00
erio
b4f6c4f9d5 style: fix gofmt formatting in gateway_service.go
Remove extra blank line that caused golangci-lint gofmt check to fail.
2026-02-07 14:51:20 +08:00