Commit Graph

358 Commits

Author SHA1 Message Date
shaw
496469ac4e fix(gateway): skip body mimicry for real Claude Code clients to restore prompt caching
PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
accounts, including real Claude Code CLI clients. This replaced the
client's long system prompt (~10K+ tokens with stable cache_control
breakpoints) with a short ~45 token [billing, CC prompt] pair, which
falls below Anthropic's 1024-token minimum cacheable prefix threshold.
The result: every request created a new cache but never hit an existing
one.

Fix: restore the Claude Code client detection gate so that real CC
clients bypass body-level mimicry (system rewrite, message cache
management, tool name obfuscation). Non-CC third-party clients
(opencode, etc.) continue to receive full mimicry.

Also harden the detection logic:
- Make UA regex case-insensitive (align with claude_code_validator.go)
- Validate metadata.user_id format via ParseMetadataUserID() instead of
  just checking non-empty, preventing third-party tools from spoofing
  a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
2026-04-25 22:50:35 +08:00
hungryboy1025
8987e0ba67 fix(openai): tighten responses stream account tests 2026-04-25 16:56:50 +08:00
shaw
732d6495ea chore(gateway): fix lint issues from cc-mimicry-parity merge
- staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
  passthrough guard (`!(a && b)` → `a != ... || !b`).
- unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
  switched both `/v1/messages` and `/count_tokens` paths to unconditional
  `account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
  is kept (still referenced by `TestIsClaudeCodeClient`).
2026-04-25 08:58:57 +08:00
keh4l
bdbd2916f5 fix(gateway): skip client header passthrough on OAuth mimicry path
Root cause of persistent third-party detection: sub2api's
buildUpstreamRequest transparently forwards client headers via
allowedHeaders whitelist (addHeaderRaw) before applying mimicry
overrides. When third-party clients (opencode, etc.) send their own
anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
values, these get appended to the request alongside our injected
headers, creating an inconsistent header set that Anthropic detects.

Parrot's build_upstream_headers constructs exactly 9 headers from
scratch and never forwards anything from the client. This is why
'same opencode version, some users work some don't' — different
opencode configs/versions send different header combinations.

Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
client header passthrough loop entirely. The subsequent
applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
pipeline constructs all necessary headers from our controlled values.

Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
now unconditionally rewrite system (even if client already sent a
Claude Code-style prompt), ensuring billing attribution block is
always present.
2026-04-25 00:43:38 +08:00
keh4l
6dc89765fd fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity
Before: isClaudeCodeRequest() checked whether the client looks like a
real Claude Code CLI (UA, system prompt, X-App header, metadata format).
If it looked like Claude Code, all mimicry was skipped — the assumption
being that a real CLI needs no help.

Problem: third-party tools like opencode partially impersonate Claude
Code (sending claude-cli UA + claude-code beta + CC system prompt) but
miss critical details (billing attribution block, tool-name obfuscation,
cache breakpoints, full beta set). Some users' opencode instances pass
the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
while Anthropic still detects the request as third-party.

This explains why 'same opencode version, some users work, some don't'
— it depends on which opencode features/config trigger the validator.

Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
matching Parrot's behavior (Parrot never checks client identity).
This is safe because our mimicry is strictly more complete than any
third-party client's partial impersonation.

Changed:
  - /v1/messages path: remove isClaudeCode gate
  - /v1/messages/count_tokens path: same
2026-04-25 00:26:37 +08:00
keh4l
f3233db01f fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths
The previous commit only wired stripMessageCacheControl,
addMessageCacheBreakpoints, and tool-name obfuscation into
applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
/responses). The native /v1/messages path and count_tokens path
have their own independent mimicry code blocks and were missed.

Now all three entry points share the same D/E/F pipeline:
  - /v1/messages (gateway_service.go forwardAnthropic)
  - /v1/messages/count_tokens (gateway_service.go countTokens)
  - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
2026-04-24 23:16:32 +08:00
keh4l
6e12578bc5 feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints
Implements the remaining three parity items with Parrot cc_mimicry:

  D) Tool-name obfuscation
     - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
       Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
       Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
       math/rand; byte-exact reproduction is impossible (Python hash vs
       Go hash), but the two invariants that matter are preserved:
         * same input tool_names yield identical mapping (cache hit)
         * prefix pool is shuffled (names look distributed)
     - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
       applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
     - Server tools (web_search_20250305, computer_*, etc.) are NOT
       renamed; only type=='function' and type=='custom' tools are.
     - tool_choice.name is rewritten in sync (only when type=='tool').
     - Response side: bytes-level replace on every SSE chunk / JSON
       body at 6 injection points (standard stream/non-stream,
       passthrough stream/non-stream, chat_completions stream +
       non-stream, responses stream + non-stream). Reverse mapping
       applied longest-fake-name-first to prevent substring conflicts
       (parity with Parrot _restore_tool_names_in_chunk).
     - tool_choice is no longer unconditionally deleted in
       normalizeClaudeOAuthRequestBody — Parrot passes it through.

  E) tools[-1] cache_control breakpoint
     - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
       the last tool has no cache_control. Client-provided ttl is
       passed through unchanged (repo-wide policy).

  F) messages cache_control strategy
     - stripMessageCacheControl removes every client-provided
       messages[*].content[*].cache_control (multi-turn stability).
     - addMessageCacheBreakpoints then injects two stable breakpoints:
       (1) last message, and (2) second-to-last user turn when
       messages.length >= 4.
     - Combined with the system block breakpoint and tools[-1]
       breakpoint, this gives exactly the 4 breakpoints Anthropic
       allows per request.

Non-trivial implementation details to be aware of when rebasing:

  * Two new files, no upstream collision:
      gateway_tool_rewrite.go       (D + E algorithms)
      gateway_messages_cache.go     (F strip + breakpoints)
  * Two new feature calls bolted onto the tail of
    applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
    conflicts will be ~10 lines maximum.
  * Response-side injection points all wrap their existing write with
    reverseToolNamesIfPresent(c, ...), preserving original behavior
    when no mapping is stored (static prefix rollback still runs).
  * Non-stream chat/responses switched from c.JSON to
    json.Marshal + c.Data so bytes-level replace is possible.
  * Retry bodies (FilterThinkingBlocksForRetry,
    FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
    only prune blocks — they preserve the already-obfuscated tool
    names, so no extra mapping re-application is needed.

Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.

Tests (16 new):
  - buildDynamicToolMap stability + below-threshold guard
  - sanitizeToolName precedence (dynamic > static)
  - restoreToolNamesInBytes longest-first + static rollback
  - applyToolNameRewriteToBody skips server tools + syncs tool_choice
  - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
  - stripMessageCacheControl + addMessageCacheBreakpoints in the
    1/4/string-content cases + second-to-last user turn selection
  - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
  - fake name shape follows Parrot {prefix}{head3}{i:02d}
2026-04-24 23:16:32 +08:00
keh4l
a25faecadd feat(gateway): align body shape with real Claude Code CLI defaults
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:

  1. temperature: previously deleted unconditionally; now passes
     through client value, defaults to 1 when absent (real CLI
     always sends temperature, default 1).

  2. max_tokens: defaults to 128000 when absent (real CLI default).

  3. context_management: when thinking.type is enabled/adaptive
     and the client did not provide context_management, inject
     {"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
     to mirror real CLI behavior.

tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).

Tests updated:
  - gateway_body_order_test.go: temperature/max_tokens are now
    expected in output; tool_choice still removed.
  - gateway_prompt_test.go: system array is now 2 blocks
    (billing + cc prompt), assertions adjusted.
  - gateway_anthropic_apikey_passthrough_test.go: same 2-block
    assertion.
2026-04-24 23:16:32 +08:00
keh4l
5862e2d8d9 feat(gateway): add billing attribution block with cc_version fingerprint
Real Claude Code CLI always sends a 2-block system array:

  [0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
  [1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}

Before this commit, sub2api's mimicry path only produced block [1].
The missing billing block is one of the primary third-party detection
signals Anthropic uses for Claude-Code-scoped OAuth tokens.

New file gateway_billing_block.go ports the fingerprint algorithm
(byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
pick chars at positions [4,7,20] of the first user text, then
`sha256(SALT + chars + cc_version)[:3]`.

  - claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
  - gateway_billing_block.go: computeClaudeCodeFingerprint +
    buildBillingAttributionBlockJSON + extractFirstUserText
  - gateway_service.go: rewriteSystemForNonClaudeCode now emits both
    blocks in order; cch=00000 is filled in later by
    signBillingHeaderCCH in buildUpstreamRequest.

Downstream compat note: syncBillingHeaderVersion's regex
`cc_version=\d+\.\d+\.\d+` only matches the semver triple,
leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
2026-04-24 23:16:32 +08:00
keh4l
66d6454535 feat(claude): add ttl to cache_control with default 5m
Real Claude CLI traffic sends cache_control as
`{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
the official CLI and one more third-party detection signal.

Policy: client-provided ttl is always passed through unchanged.
Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
to avoid burning the 1h cache budget on automatic breakpoints while
still aligning with the `ttl` field being present.

  - claude/constants.go: DefaultCacheControlTTL = "5m"
  - apicompat/types.go: new AnthropicCacheControl type with TTL field;
    AnthropicTool gains optional CacheControl pointer so the mimicry
    path can attach a cache breakpoint to tools[-1] later.
  - service/gateway_service.go: anthropicCacheControlPayload gains TTL;
    marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
    emit ttl=5m by default.
2026-04-24 23:16:32 +08:00
keh4l
165553cfb0 fix(gateway): use full beta list in buildUpstreamRequest mimicry path
The previous commit added FullClaudeCodeMimicryBetas() but the two
call sites in buildUpstreamRequest still hardcoded the old 3-token
subset. Anthropic now checks the complete set of beta tokens to
decide if a request qualifies as Claude Code. Wire them up:

  - /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
  - /v1/messages/count_tokens mimic path: same + BetaTokenCounting

Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
2026-04-24 23:16:32 +08:00
keh4l
b5467d610a fix(gateway): apply full Claude Code mimicry on /chat/completions and /responses
Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
which prepends the Claude Code banner but leaves the rest of the body
in its original non-Claude-Code shape. The codebase already admits this
is insufficient (see the comment on rewriteSystemForNonClaudeCode in
gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").

Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
were detected as third-party apps and bled plan quota with:

    Third-party apps now draw from your extra usage, not your plan limits.

Fix:
  - apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
    survives the OpenAI->Anthropic->Marshal round trip; without it the
    downstream rewrite has no user_id to work with.
  - service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
    variant of the /v1/messages mimicry pipeline
    (rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
    metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
  - service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
    for the same reason (no ParsedRequest at the call site).
  - ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
    prompt-prepend with the full mimicry pipeline.
  - applyClaudeCodeMimicHeaders: set x-client-request-id per-request
    (real Claude CLI always does); missing/duplicated values are one more
    third-party fingerprint signal.

No change to the native /v1/messages path: it already called the full
pipeline, we only lift those helpers into a reusable function.

Tests:
  - go build ./... passes
  - go test ./internal/service/... ./internal/pkg/apicompat/... passes
  - lsp_diagnostics clean on all touched files
  - pre-existing failures in internal/config are unrelated (env-sensitive
    tests that also fail on upstream main)
2026-04-24 23:16:32 +08:00
erio
258fd145ff fix(account): prevent quota-exceeded API key/Bedrock accounts from being scheduled
Add quota exceeded check to IsSchedulable() and refactor
shouldClearStickySession to delegate to IsSchedulable(), eliminating
duplicated logic and fixing missed overload/rate-limit/expired checks.
Frontend displays quota exceeded status independently via quota fields.
2026-04-19 18:45:04 +08:00
erio
44cdef7934 fix(usage): subscription billing honours group rate multiplier
Subscription-mode billing was consuming quota at TotalCost (raw) instead of
ActualCost (TotalCost * RateMultiplier), so per-group rate multipliers —
including free subscriptions (multiplier = 0) — were silently ignored.
Switch the three subscription cost writes in buildUsageBillingCommand,
finalizePostUsageBilling, and the legacy postUsageBilling fallback to
ActualCost, and add a table-driven test covering 2x / 0.5x / free multipliers
plus a balance-mode regression check.
2026-04-17 22:06:32 +08:00
erio
10699eeb34 refactor: extract ReadUpstreamResponseBody to deduplicate upstream response read + too-large error handling
Consolidates 9 call sites of resolveUpstreamResponseReadLimit + readUpstreamResponseBodyLimited + ErrUpstreamResponseBodyTooLarge error handling into a single ReadUpstreamResponseBody function with TooLargeWriter callback for API-format-specific error responses (Anthropic, OpenAI, countTokens).
2026-04-16 01:53:22 +08:00
erio
a9880ee7b9 fix: round-2 audit fixes — security, code quality, and UI improvements
Security (HIGH):
- Normalize all Redis cache keys to lowercase (verifyCode, passwordReset)
- Fix verify code TTL renewal on failed attempts: use remaining TTL via
  ExpiresAt field instead of resetting to full 15-minute window
- Add 3 missing fields to diffSettings audit log (promo_code, invitation_code,
  custom_endpoints)

Code quality (MEDIUM):
- Extract filterVerifiedEmails shared helper (balance_notify_service.go)
- Add Pricing array non-empty validation for channel pricing rules
- Add platform token semantics comment in gateway_service.go
- Complete validatePlanPatch test coverage (+10 test cases)
- Replace string types with QuotaThresholdType/QuotaResetMode across frontend
- Remove duplicate getPlatformTextColor/getRateBadgeClass in ChannelsView
- Return EMAIL_NOT_FOUND error on RemoveNotifyEmail miss

UI improvements:
- Reorder cost tooltip: user billing above separator, account billing below
- Add NaN guard to accountBilled function
- Move timezone selector inline into reset-mode row (no longer standalone)
2026-04-14 09:35:05 +08:00
erio
98c9d51791 fix: correct account stats pricing priority order
Priority was wrong:
- Before: custom rules → LiteLLM (when ApplyPricingToAccountStats) → nil
- After:  custom rules → totalCost (when ApplyPricingToAccountStats) → LiteLLM → nil

When ApplyPricingToAccountStats is enabled, use the request's actual
client billing cost (before multiplier) as account_stats_cost, instead
of recalculating from LiteLLM per-token prices which produced incorrect
values for per-request billing mode.

LiteLLM model pricing is now the final fallback (priority 3), used only
when neither custom rules nor ApplyPricingToAccountStats apply.
2026-04-14 09:28:11 +08:00
erio
1262654d97 feat: WebSearch tri-state, account stats pricing fix, quota cache fix, usage tooltip
WebSearch tri-state switch:
- Account-level web_search_emulation changed from bool to tri-state
  string: "default" (follow channel) / "enabled" / "disabled"
- shouldEmulateWebSearch checks channel config when account is "default"
- SQL migration converts old bool values
- Frontend select replaces toggle in Edit/CreateAccountModal

Account stats pricing:
- resolveAccountStatsCost uses upstream model (post-mapping) for matching
- Priority: custom rules → model pricing file (when toggle on) → default
- Custom rules always configurable, independent of toggle
- Account ID field changed to searchable selector filtered by platform
- Description updated to reflect new behavior

Quota notification cache fix:
- CheckAccountQuotaAfterIncrement fetches real-time account from DB
- Reconstructs pre-increment usage for accurate threshold crossing detection
- New AccountQuotaReader interface (minimal: GetByID only)

Usage tooltip:
- Per-request/image billing shows per-request price instead of $0 token price
- Token billing continues to show input/output price per million tokens
2026-04-14 09:26:08 +08:00
erio
11c4606874 fix(channel): use upstream model for account stats pricing and remove channel pricing fallback
- resolveAccountStatsCost now uses the final upstream model (after
  account-level mapping) to match custom pricing rules, fixing the
  issue where requested model (e.g. claude-sonnet-4-5) didn't match
  rules configured for upstream model (e.g. claude-opus-4-6)
- Remove tryChannelPricing fallback — only custom rules are applied,
  unmatched requests use default formula (total_cost × rate)
- Remove unused billingService and serviceTier parameters
- Update description: "启用后将支持自定义账号统计的模型价格"
2026-04-14 09:26:08 +08:00
erio
31550a2c6a fix(notify): use real-time balance for crossing detection and simplify email logic
- Fix cached balance causing threshold crossing to never trigger:
  read real-time balance from billingCacheService instead of stale
  API key auth snapshot
- Remove email="" placeholder concept; all emails are user-managed
- Only send notifications to verified && non-disabled emails
- Frontend: pre-fill user's email in add input when list is empty
- Remove FilterEnabledEmails/IsPrimaryDisabled helpers (no longer needed)
2026-04-14 09:26:07 +08:00
erio
c3812ce1e3 fix(notify): address review findings - accountCost formula, dedup, refactor
- Fix accountCost calculation in finalizePostUsageBilling to match
  postUsageBilling (always multiply by AccountRateMultiplier)
- Use strings.EqualFold for email dedup in collectBalanceNotifyRecipients
- Extract CheckAccountQuotaAfterIncrement into smaller functions:
  buildQuotaDims + asyncSendQuotaAlert (< 30 lines each)
- Add "not splittable" comments for HTML template functions
- Extract QuotaNotifyToggle.vue sub-component to reduce
  QuotaLimitCard.vue from 404 to 339 lines
2026-04-14 09:23:16 +08:00
erio
b32d1a2c9f feat(notify): add balance low & account quota notification system
- User balance low notification: email alert when balance drops below
  configurable threshold (user email + verified extra emails)
- Account quota notification: broadcast email to admin-configured
  recipients when daily/weekly/total quota usage exceeds alert threshold
- Admin settings: global enable/disable, default threshold, quota
  notification email list (Email Settings tab)
- User profile: enable/disable, custom threshold, add/remove extra
  notification emails with verification code flow
- Account quota: per-dimension alert toggle and threshold in quota
  control card
- Trigger logic: first-crossing only (old >= threshold && new < threshold
  for balance; old < threshold && new >= threshold for quota), naturally
  prevents duplicate notifications without Redis dedup
2026-04-14 09:23:02 +08:00
erio
7535e312e0 feat(channels): add custom account stats pricing rules
Allow channels to configure independent model pricing for account
statistics cost calculation, decoupled from user billing.

Backend:
- Migration 101: channels.apply_pricing_to_account_stats toggle,
  channel_account_stats_pricing_rules/model_pricing tables,
  usage_logs.account_stats_cost column
- resolveAccountStatsCost: match rules by group/account, then channel
  pricing, fallback to original formula when unconfigured
- Integrate into both GatewayService.recordUsageCore and
  OpenAIGatewayService.RecordUsage
- Update 8 account stats SQL queries to use
  COALESCE(account_stats_cost, total_cost) * account_rate_multiplier
- 23 unit tests for matching, pricing lookup, and cost calculation

Frontend:
- Channel edit dialog: toggle + custom rules UI with group/account
  multi-select and pricing entry cards
- API types and i18n (zh/en)
2026-04-14 09:22:12 +08:00
erio
1b53ffcac7 feat(gateway): add web search emulation for Anthropic API Key accounts
Inject web search capability for Claude Console (API Key) accounts that
don't natively support Anthropic's web_search tool. When a pure
web_search request is detected, the gateway calls Brave Search or Tavily
API directly and constructs an Anthropic-protocol-compliant SSE/JSON
response without forwarding to upstream.

Backend:
- New `pkg/websearch/` SDK: Brave and Tavily provider implementations
  with io.LimitReader, proxy support, and Redis-based quota tracking
  (Lua atomic INCR + TTL, DECR rollback on failure)
- Global config via `settings.web_search_emulation_config` (JSON) with
  in-process cache + singleflight, input validation, API key merge on
  save, and sanitized API responses
- Channel-level toggle via `channels.features_config` JSONB column
  (DB migration 101)
- Account-level toggle via `accounts.extra.web_search_emulation`
- Request interception in `Forward()` with SSE streaming response
  construction using json.Marshal (no manual string concatenation)
- Manager hot-reload: `RebuildWebSearchManager()` called on config save
  and startup via `SetWebSearchRedisClient()`
- 70 unit tests covering providers, manager, config validation,
  sanitization, tool detection, query extraction, and response building

Frontend:
- Settings → Gateway tab: Web Search Emulation config card with global
  toggle, provider list (add/remove, API key, priority, quota, proxy)
- Channels → Anthropic tab: web search emulation toggle with global
  state linkage (disabled when global off)
- Account Create/Edit modals: web search emulation toggle for API Key
  type with Toggle component
- Full i18n coverage (zh + en)
2026-04-14 09:20:39 +08:00
erio
37c23eccfe fix: gofmt formatting 2026-04-14 09:14:29 +08:00
erio
e374874125 feat(channel): improve cache strategy and add restriction logging
- Change channel cache TTL from 60s to 10min (reduce unnecessary DB queries)
- Actively rebuild cache after CRUD instead of lazy invalidation
- Add slog.Warn logging for channel pricing restriction blocks (4 places)
2026-04-14 09:13:53 +08:00
erio
160903fce7 fix: address review findings for channel restriction refactoring
- Fix 7 stale comments still mentioning "限制检查" in handlers/services
- Make billingModelForRestriction explicitly list channel_mapped case
- Add slog.Warn for error swallowing in ResolveChannelMapping and
  needsUpstreamChannelRestrictionCheck
- Document sticky session upstream check exemption
2026-04-14 09:12:42 +08:00
erio
2dce4306b4 refactor: move channel model restriction from handler to scheduling phase
Move the model pricing restriction check from 8 handler entry points
to the account scheduling phase (SelectAccountForModelWithExclusions /
SelectAccountWithLoadAwareness), aligning restriction with billing:

- requested: check original request model against pricing list
- channel_mapped: check channel-mapped model against pricing list
- upstream: per-account check using account-mapped model

Handler layer now only resolves channel mapping (no restriction).
Scheduling layer performs pre-check for requested/channel_mapped,
and per-account filtering for upstream billing source.
2026-04-14 09:12:29 +08:00
ius
265687b56d fix: 优化调度快照缓存以避免 Redis 大 MGET 2026-04-08 10:39:15 -07:00
shaw
e51c9e50b5 feat: sync billing header cc_version with User-Agent and add opt-in CCH signing
- Sync cc_version in x-anthropic-billing-header with the fingerprint
  User-Agent version, preserving the message-derived suffix
- Implement xxHash64-based CCH signing to replace the cch=00000
  placeholder with a computed hash
- Add admin toggle (enable_cch_signing) under gateway forwarding settings,
  disabled by default
2026-04-08 16:11:19 +08:00
shaw
1c9a2128cf fix: 修复非CC客户端OAuth伪装被Anthropic检测为第三方应用的问题
commit f3aa54b 的 rewriteSystemForNonClaudeCode 未能通过 Anthropic 第三方检测,
根因是两个关键信号与真实 Claude Code 不一致:

1. anthropic-beta 头缺少 claude-code-20250219:伪装路径主动将该 beta
   加入 drop set 并移除,但 Anthropic 依赖此 beta 识别 Claude Code 请求。
   修复:非 haiku 模型的伪装请求强制包含 claude-code beta。

2. system 字段使用 string 格式而非 array+cache_control:真实 Claude Code
   始终以 [{type,text,cache_control:{type:"ephemeral"}}] 发送 system,
   string 格式成为第三方检测信号。
   修复:rewriteSystemForNonClaudeCode 改为注入 array 格式。

附带调整:stripSystemCacheControl 按 system 是否被重写动态决定,
重写时保留 CC prompt 的 cache_control,未重写时(haiku/已含CC前缀)
保持原有剥离行为。
2026-04-08 14:06:06 +08:00
shaw
7c60ee3c85 feat: Beta策略支持按模型区分处理(模型白名单) 2026-04-07 20:33:09 +08:00
shaw
f3aa54b770 fix: 非Claude Code客户端system prompt迁移至messages以绕过第三方应用检测
Anthropic近期引入基于system参数内容的第三方应用检测机制,原有的前置追加
Claude Code提示词策略无法通过检测(后续内容仍为非Claude Code格式触发429)。

新策略:对非Claude Code客户端的OAuth/SetupToken账号请求,将system字段
完整替换为Claude Code标识提示词,原始system内容作为user/assistant消息对
注入messages开头,模型仍接收完整指令。

仅影响/v1/messages路径,chat_completions和responses路径保持原有逻辑不变。
真正的Claude Code客户端请求完全不受影响(原样透传)。
2026-04-07 17:06:47 +08:00
erio
19655a15f1 fix: gofmt formatting 2026-04-05 18:56:40 +08:00
erio
f345b0f595 fix: use upstream versions of shared files and remove only Sora code
Restore gateway_service.go, setting_handler.go, routes/admin.go,
dto/settings.go, group_repo.go, api_key_repo.go, wire_gen.go to
upstream/main versions and surgically remove only Sora references.

This preserves upstream-only features (RequireOauthOnly, RequirePrivacySet,
GroupResolution, etc.) that were missing when using release branch versions.
2026-04-05 18:48:41 +08:00
erio
62e80c602d revert: completely remove all Sora functionality 2026-04-05 17:11:01 +08:00
erio
e88b2890d1 refactor: unify interval filtering and eliminate redundant Resolve calls
- applyRequestTierOverrides now uses filterValidIntervals consistently
  with applyTokenOverrides (per_request/image modes were not filtering)
- CostInput accepts optional pre-resolved pricing via Resolved field,
  eliminating duplicate Resolver.Resolve() calls in gateway billing paths
2026-04-04 15:15:33 +08:00
erio
1b5ae71d1f fix: resolve golangci-lint issues — remove unused constants and functions, fix gofmt
- Remove unused claudeMax*Tokens constants (Claude Max feature not included)
- Remove unused UsageMapHook type, SetUsageMapHook method, and usageToMap function
- Fix gofmt formatting in channel_service.go, openai_model_mapping_test.go,
  chatcompletions_to_responses.go
2026-04-04 14:58:20 +08:00
erio
e59fa8637a fix: resolve cherry-pick compilation and test issues
- Add int64(0) param to SelectAccountWithLoadAwareness callers (signature change from channel scheduling refactor)
- Add UsageMapHook type and struct field to StreamingProcessor
- Revert Claude Max cache billing code to upstream/main (not part of channel feature)
- Revert credits overages logic to upstream/main (non-channel change)
- Remove Instructions field reference (non-channel OpenAI feature)
- Restore sora_client_handler_test.go from upstream + add channel service nil params
2026-04-04 12:38:50 +08:00
erio
58f758c816 feat(channel): improve cache strategy and add restriction logging
- Change channel cache TTL from 60s to 10min (reduce unnecessary DB queries)
- Actively rebuild cache after CRUD instead of lazy invalidation
- Add slog.Warn logging for channel pricing restriction blocks (4 places)
2026-04-04 11:25:01 +08:00
erio
71f61bbc47 fix: resolve 5 audit findings in channel/credits/scheduling
P0-1: Credits degraded response retry + fail-open
- Add isAntigravityDegradedResponse() to detect transient API failures
- Retry up to 3 times with exponential backoff (500ms/1s/2s)
- Invalidate singleflight cache between retries
- Fail-open after exhausting retries instead of 5h circuit break

P1-1: Fix channel restriction pre-check timing conflict
- Swap checkClaudeCodeRestriction before checkChannelPricingRestriction
- Ensures channel restriction is checked against final fallback groupID

P1-2: Add interval pricing validation (frontend + backend)
- Backend: ValidateIntervals() with boundary, price, overlap checks
- Frontend: validateIntervals() with Chinese error messages
- Rules: MinTokens>=0, MaxTokens>MinTokens, prices>=0, no overlap

P2: Fix cross-platform same-model pricing/mapping override
- Store cache keys using original platform instead of group platform
- Lookup across matching platforms (antigravity→anthropic→gemini)
- Prevents anthropic/gemini same-name models from overwriting each other
2026-04-04 11:25:01 +08:00
erio
1fca2bfab1 fix: address review findings for channel restriction refactoring
- Fix 7 stale comments still mentioning "限制检查" in handlers/services
- Make billingModelForRestriction explicitly list channel_mapped case
- Add slog.Warn for error swallowing in ResolveChannelMapping and
  needsUpstreamChannelRestrictionCheck
- Document sticky session upstream check exemption
2026-04-04 11:25:01 +08:00
erio
ce41afb756 refactor: move channel model restriction from handler to scheduling phase
Move the model pricing restriction check from 8 handler entry points
to the account scheduling phase (SelectAccountForModelWithExclusions /
SelectAccountWithLoadAwareness), aligning restriction with billing:

- requested: check original request model against pricing list
- channel_mapped: check channel-mapped model against pricing list
- upstream: per-account check using account-mapped model

Handler layer now only resolves channel mapping (no restriction).
Scheduling layer performs pre-check for requested/channel_mapped,
and per-account filtering for upstream billing source.
2026-04-04 11:24:48 +08:00
erio
b4a42a640d refactor: extract helpers to reduce duplication and function length in gateway billing
- Extract resolveChannelPricing to DRY the resolver pattern shared by calculateImageCost/calculateTokenCost
- Remove unnecessary IIFE wrapper and pass accountRateMultiplier as parameter
- Extract resolveBillingMode, resolveMediaType, optionalSubscriptionID to simplify buildRecordUsageLog (104→65 lines)
- Extract shouldDeductAPIKeyQuota/shouldUpdateRateLimits/shouldUpdateAccountQuota methods on postUsageBillingParams to unify duplicated billing conditions
2026-04-04 11:23:01 +08:00
erio
58b26cb4c8 refactor: merge RecordUsage and RecordUsageWithLongContext into shared core
- Extract recordUsageCore with recordUsageOpts for parameterized differences
- RecordUsage (276 lines) → thin wrapper (~40 lines)
- RecordUsageWithLongContext (251 lines) → thin wrapper (~20 lines)
- Split billing logic into calculateSoraMediaCost, calculateImageCost,
  calculateTokenCost sub-functions
- Extract buildRecordUsageLog for usage log construction
- Net reduction: -79 lines, eliminated ~170 lines of duplication
2026-04-04 11:22:46 +08:00
erio
0d241d52eb refactor: replace magic strings with named constants
- PricingSourceChannel/LiteLLM/Fallback for resolver source
- MediaTypeImage/Video/Prompt for result.MediaType
- Reuse BillingModeToken/BillingModeImage for billing mode
- Reuse BillingModelSourceChannelMapped/PlatformAnthropic in handler
2026-04-04 11:20:43 +08:00
erio
f3ab3fe5e2 fix: billing mode display follows cost calculation result
Instead of hardcoding BillingMode="image" when ImageCount>0,
let cost.BillingMode (set by CalculateCostUnified/CalculateImageCost)
take priority. This ensures channel token pricing shows "token" mode.
2026-04-04 11:19:36 +08:00
erio
38da737e6c feat: channel token pricing takes priority over per-image billing
When ImageCount > 0, check if channel has token pricing configured:
- YES (source=channel, mode=token) → use token billing with image_output_tokens
- NO → fall back to CalculateImageCost (original per-image billing)

This allows channels to configure $/MTok pricing for image generation
models while maintaining backward compatibility for setups without
channel pricing.
2026-04-04 11:17:49 +08:00
erio
35a9290528 fix: add cost nil guard to Anthropic/Antigravity RecordUsage paths
- Apply same nil-pointer protection as OpenAI path
- Remove unused accessToken/proxyURL params from checkAccountCredits
2026-04-04 11:17:48 +08:00
erio
d72ac92694 feat: image output token billing, channel-mapped billing source, credits balance precheck
- Parse candidatesTokensDetails from Gemini API to separate image/text output tokens
- Add image_output_tokens and image_output_cost to usage_log (migration 089)
- Support per-image-token pricing via output_cost_per_image_token from model pricing data
- Channel pricing ImageOutputPrice override works in token billing mode
- Auto-fill image_output_price in channel pricing form from model defaults
- Add "channel_mapped" billing model source as new default (migration 088)
- Bills by model name after channel mapping, before account mapping
- Fix channel cache error TTL sign error (115s → 5s)
- Fix Update channel only invalidating new groups, not removed groups
- Fix frontend model_mapping clearing sending undefined instead of {}
- Credits balance precheck via shared AccountUsageService cache before injection
- Skip credits injection for accounts with insufficient balance
- Don't mark credits exhausted for "exhausted your capacity on this model" 429s
2026-04-04 11:15:59 +08:00