Files
sub2api/backend/internal/service/gateway_tool_rewrite.go
keh4l 6e12578bc5 feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints
Implements the remaining three parity items with Parrot cc_mimicry:

  D) Tool-name obfuscation
     - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
       Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
       Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
       math/rand; byte-exact reproduction is impossible (Python hash vs
       Go hash), but the two invariants that matter are preserved:
         * same input tool_names yield identical mapping (cache hit)
         * prefix pool is shuffled (names look distributed)
     - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
       applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
     - Server tools (web_search_20250305, computer_*, etc.) are NOT
       renamed; only type=='function' and type=='custom' tools are.
     - tool_choice.name is rewritten in sync (only when type=='tool').
     - Response side: bytes-level replace on every SSE chunk / JSON
       body at 6 injection points (standard stream/non-stream,
       passthrough stream/non-stream, chat_completions stream +
       non-stream, responses stream + non-stream). Reverse mapping
       applied longest-fake-name-first to prevent substring conflicts
       (parity with Parrot _restore_tool_names_in_chunk).
     - tool_choice is no longer unconditionally deleted in
       normalizeClaudeOAuthRequestBody — Parrot passes it through.

  E) tools[-1] cache_control breakpoint
     - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
       the last tool has no cache_control. Client-provided ttl is
       passed through unchanged (repo-wide policy).

  F) messages cache_control strategy
     - stripMessageCacheControl removes every client-provided
       messages[*].content[*].cache_control (multi-turn stability).
     - addMessageCacheBreakpoints then injects two stable breakpoints:
       (1) last message, and (2) second-to-last user turn when
       messages.length >= 4.
     - Combined with the system block breakpoint and tools[-1]
       breakpoint, this gives exactly the 4 breakpoints Anthropic
       allows per request.

Non-trivial implementation details to be aware of when rebasing:

  * Two new files, no upstream collision:
      gateway_tool_rewrite.go       (D + E algorithms)
      gateway_messages_cache.go     (F strip + breakpoints)
  * Two new feature calls bolted onto the tail of
    applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
    conflicts will be ~10 lines maximum.
  * Response-side injection points all wrap their existing write with
    reverseToolNamesIfPresent(c, ...), preserving original behavior
    when no mapping is stored (static prefix rollback still runs).
  * Non-stream chat/responses switched from c.JSON to
    json.Marshal + c.Data so bytes-level replace is possible.
  * Retry bodies (FilterThinkingBlocksForRetry,
    FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
    only prune blocks — they preserve the already-obfuscated tool
    names, so no extra mapping re-application is needed.

Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.

Tests (16 new):
  - buildDynamicToolMap stability + below-threshold guard
  - sanitizeToolName precedence (dynamic > static)
  - restoreToolNamesInBytes longest-first + static rollback
  - applyToolNameRewriteToBody skips server tools + syncs tool_choice
  - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
  - stripMessageCacheControl + addMessageCacheBreakpoints in the
    1/4/string-content cases + second-to-last user turn selection
  - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
  - fake name shape follows Parrot {prefix}{head3}{i:02d}
2026-04-24 23:16:32 +08:00

314 lines
10 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package service
import (
"fmt"
"hash/fnv"
"math/rand"
"sort"
"strings"
"github.com/Wei-Shaw/sub2api/internal/pkg/claude"
"github.com/tidwall/gjson"
"github.com/tidwall/sjson"
)
// toolNameRewriteKey 是 gin.Context 上存 ToolNameRewrite 映射的 key。
// 请求阶段写入,响应阶段读取,用于 bytes 级逆向还原假名 → 真名。
const toolNameRewriteKey = "claude_tool_name_rewrite"
// staticToolNameRewrites 是"静态前缀映射",与 Parrot src/transform/cc_mimicry.py
// TOOL_NAME_REWRITES 完全一致。只有以这些前缀开头的工具会被重写。
var staticToolNameRewrites = map[string]string{
"sessions_": "cc_sess_",
"session_": "cc_ses_",
}
// fakeToolNamePrefixes 是"动态映射"的前缀池,与 Parrot _FAKE_PREFIXES 一致。
// 当 tools 数量 > dynamicToolMapThreshold 时随机选用其中前缀生成可读假名。
var fakeToolNamePrefixes = []string{
"analyze_", "compute_", "fetch_", "generate_", "lookup_", "modify_",
"process_", "query_", "render_", "resolve_", "sync_", "update_",
"validate_", "convert_", "extract_", "manage_", "monitor_", "parse_",
"review_", "search_", "transform_", "handle_", "invoke_", "notify_",
}
// dynamicToolMapThreshold 与 Parrot 一致tools 数量超过 5 才启用动态映射。
// 少量工具不需要混淆(一般是 Claude Code 自己的核心工具 bash/edit/read 等)。
const dynamicToolMapThreshold = 5
// ToolNameRewrite 是单次请求内的工具名混淆映射。
// - Forward: real → fake请求阶段在 body 上应用。
// - Reverse: fake → real响应阶段对每个 chunk 做 bytes.Replace 还原。
//
// ReverseOrdered 是按假名长度倒序的 (fake, real) 列表,用于防止短假名是长假名的
// 子串时 bytes.Replace 先被吃掉(对齐 Parrot _restore_tool_names_in_chunk 的
// `sorted(..., key=lambda x: len(x[1]), reverse=True)`)。
type ToolNameRewrite struct {
Forward map[string]string
Reverse map[string]string
ReverseOrdered [][2]string
}
// buildDynamicToolMap 构造 tools 的动态假名映射。
//
// 与 Parrot _build_dynamic_tool_map 语义等价:
// - tools 数量 ≤ dynamicToolMapThreshold 时返回 nil不做动态映射走静态 fallback
// - 同一组 tool_names 在同进程内映射稳定(保证 cache 命中)
//
// Parrot 用 `random.Random(hash(tuple(tool_names)))` 作 seed + shuffle 前缀池;
// Go 无法字节级复刻 Python hash但"稳定性"和"前缀池打散"两个不变量都保留:
// 用 fnv64a(strings.Join(names, "\x00")) 作 seed 喂 math/rand.New。
// 字节级不同不影响上游判定Anthropic 不会验证我们的随机种子算法)。
func buildDynamicToolMap(toolNames []string) map[string]string {
if len(toolNames) <= dynamicToolMapThreshold {
return nil
}
h := fnv.New64a()
for i, n := range toolNames {
if i > 0 {
_, _ = h.Write([]byte{0})
}
_, _ = h.Write([]byte(n))
}
rng := rand.New(rand.NewSource(int64(h.Sum64())))
available := make([]string, len(fakeToolNamePrefixes))
copy(available, fakeToolNamePrefixes)
rng.Shuffle(len(available), func(i, j int) { available[i], available[j] = available[j], available[i] })
mapping := make(map[string]string, len(toolNames))
for i, name := range toolNames {
prefix := available[i%len(available)]
headLen := 3
if len(name) < 3 {
headLen = len(name)
}
fake := fmt.Sprintf("%s%s%02d", prefix, name[:headLen], i)
mapping[name] = fake
}
return mapping
}
// sanitizeToolName 把真名转成假名。
// 与 Parrot _sanitize_tool_name 语义一致:动态映射优先,再走静态前缀映射。
func sanitizeToolName(name string, dynamic map[string]string) string {
if dynamic != nil {
if fake, ok := dynamic[name]; ok {
return fake
}
}
for prefix, replacement := range staticToolNameRewrites {
if strings.HasPrefix(name, prefix) {
return replacement + name[len(prefix):]
}
}
return name
}
// shouldMimicToolName 指示某个 tool 是否需要重命名。
// server tooltype != "" 且不是 "function" / "custom")是 Anthropic 协议语义的一部分,
// 比如 "web_search_20250305" / "computer_20250124";误改会导致上游拒绝。
func shouldMimicToolName(toolType string) bool {
if toolType == "" || toolType == "function" || toolType == "custom" {
return true
}
return false
}
// buildToolNameRewriteFromBody 扫描 body 的 tools[*].name构造 ToolNameRewrite
// 并返回它。若不需要混淆tools 数量不足 + 没有匹配静态前缀的工具)返回 nil。
//
// 注意:只扫描,不改 body。真正的 body 改写在 applyToolNameRewriteToBody。
func buildToolNameRewriteFromBody(body []byte) *ToolNameRewrite {
tools := gjson.GetBytes(body, "tools")
if !tools.IsArray() {
return nil
}
mimicableNames := make([]string, 0)
toolsArr := tools.Array()
for _, t := range toolsArr {
if !shouldMimicToolName(t.Get("type").String()) {
continue
}
name := t.Get("name").String()
if name == "" {
continue
}
mimicableNames = append(mimicableNames, name)
}
dynamic := buildDynamicToolMap(mimicableNames)
rw := &ToolNameRewrite{
Forward: make(map[string]string),
Reverse: make(map[string]string),
}
for _, name := range mimicableNames {
fake := sanitizeToolName(name, dynamic)
if fake == name {
continue
}
rw.Forward[name] = fake
rw.Reverse[fake] = name
}
if len(rw.Forward) == 0 {
return nil
}
rw.ReverseOrdered = make([][2]string, 0, len(rw.Reverse))
for fake, real := range rw.Reverse {
rw.ReverseOrdered = append(rw.ReverseOrdered, [2]string{fake, real})
}
sort.SliceStable(rw.ReverseOrdered, func(i, j int) bool {
return len(rw.ReverseOrdered[i][0]) > len(rw.ReverseOrdered[j][0])
})
return rw
}
// applyToolNameRewriteToBody 把已构造的 ToolNameRewrite 应用到 body 上:
// - 改写 $.tools[*].name仅对 shouldMimicToolName 通过的 tool
// - 在 $.tools[last].cache_control 上打 ephemeral 缓存断点Parrot 行为对齐,
// ttl 客户端已有则透传,否则默认 claude.DefaultCacheControlTTL
// - 改写 $.tool_choice.name仅当 $.tool_choice.type == "tool"
//
// 历史 $.messages[*].content[*].nametool_use不在请求侧改写——这与 Parrot 一致;
// 响应侧 bytes.Replace 会连带还原它们。
func applyToolNameRewriteToBody(body []byte, rw *ToolNameRewrite) []byte {
if rw == nil || len(rw.Forward) == 0 {
body = applyToolsLastCacheBreakpoint(body)
return body
}
tools := gjson.GetBytes(body, "tools")
if tools.IsArray() {
idx := -1
tools.ForEach(func(_, t gjson.Result) bool {
idx++
if !shouldMimicToolName(t.Get("type").String()) {
return true
}
name := t.Get("name").String()
if name == "" {
return true
}
fake, ok := rw.Forward[name]
if !ok {
return true
}
if next, err := sjson.SetBytes(body, fmt.Sprintf("tools.%d.name", idx), fake); err == nil {
body = next
}
return true
})
}
if tc := gjson.GetBytes(body, "tool_choice"); tc.Exists() && tc.Get("type").String() == "tool" {
name := tc.Get("name").String()
if fake, ok := rw.Forward[name]; ok {
if next, err := sjson.SetBytes(body, "tool_choice.name", fake); err == nil {
body = next
}
}
}
body = applyToolsLastCacheBreakpoint(body)
return body
}
// applyToolsLastCacheBreakpoint 在 tools 数组最后一个工具上注入 cache_control
// 断点,对齐 Parrot `tools[-1]["cache_control"] = {"type":"ephemeral","ttl":"1h"}`
// 行为,但 ttl 按本仓规则:
// - 客户端已为该 tool 显式设置 cache_control.ttl → 完全透传不覆盖
// - 否则注入 {"type":"ephemeral","ttl": claude.DefaultCacheControlTTL}
//
// 纯副作用函数tools 不存在或为空数组时 no-op。
func applyToolsLastCacheBreakpoint(body []byte) []byte {
tools := gjson.GetBytes(body, "tools")
if !tools.IsArray() {
return body
}
arr := tools.Array()
if len(arr) == 0 {
return body
}
lastIdx := len(arr) - 1
existingCC := arr[lastIdx].Get("cache_control")
if existingCC.Exists() && existingCC.Get("ttl").String() != "" {
return body
}
if existingCC.Exists() {
if next, err := sjson.SetBytes(body, fmt.Sprintf("tools.%d.cache_control.ttl", lastIdx), claude.DefaultCacheControlTTL); err == nil {
body = next
}
return body
}
raw := fmt.Sprintf(`{"type":"ephemeral","ttl":%q}`, claude.DefaultCacheControlTTL)
if next, err := sjson.SetRawBytes(body, fmt.Sprintf("tools.%d.cache_control", lastIdx), []byte(raw)); err == nil {
body = next
}
return body
}
// restoreToolNamesInBytes 对 bytes chunk 做逆向还原:假名 → 真名。
// 按 ReverseOrdered 的假名长度倒序逐个 bytes.Replace防止子串冲突
// (与 Parrot _restore_tool_names_in_chunk 的 sorted(..., reverse=True) 等价)。
// 再做静态前缀还原cc_sess_ → sessions_ / cc_ses_ → session_
//
// rw 可为 nilnil 时仍会做静态前缀还原。
func restoreToolNamesInBytes(data []byte, rw *ToolNameRewrite) []byte {
if rw != nil {
for _, pair := range rw.ReverseOrdered {
fake, real := pair[0], pair[1]
if fake == "" || fake == real {
continue
}
data = replaceAllBytes(data, fake, real)
}
}
for prefix, replacement := range staticToolNameRewrites {
data = replaceAllBytes(data, replacement, prefix)
}
return data
}
// replaceAllBytes 是 bytes.ReplaceAll 的便捷封装,避免每个调用点各自做 []byte 转换。
func replaceAllBytes(data []byte, from, to string) []byte {
if len(data) == 0 || from == to || !strings.Contains(string(data), from) {
return data
}
return []byte(strings.ReplaceAll(string(data), from, to))
}
// toolNameRewriteFromContext 从 gin.Context 取出请求阶段保存的工具名映射。
// 找不到c==nil 或 key 不存在或类型不对)时返回 nil调用方必须能处理 nil。
func toolNameRewriteFromContext(c interface {
Get(string) (any, bool)
}) *ToolNameRewrite {
if c == nil {
return nil
}
raw, ok := c.Get(toolNameRewriteKey)
if !ok || raw == nil {
return nil
}
rw, _ := raw.(*ToolNameRewrite)
return rw
}
// reverseToolNamesIfPresent 是响应侧 5 处注入点的统一封装:从 c 取出 mapping
// 并对 chunk 做 bytes 级假名→真名替换。c 没有 mapping 时仍会做静态前缀还原。
func reverseToolNamesIfPresent(c interface {
Get(string) (any, bool)
}, chunk []byte) []byte {
rw := toolNameRewriteFromContext(c)
if rw == nil && len(staticToolNameRewrites) == 0 {
return chunk
}
return restoreToolNamesInBytes(chunk, rw)
}