feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints

Implements the remaining three parity items with Parrot cc_mimicry:

  D) Tool-name obfuscation
     - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
       Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
       Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
       math/rand; byte-exact reproduction is impossible (Python hash vs
       Go hash), but the two invariants that matter are preserved:
         * same input tool_names yield identical mapping (cache hit)
         * prefix pool is shuffled (names look distributed)
     - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
       applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
     - Server tools (web_search_20250305, computer_*, etc.) are NOT
       renamed; only type=='function' and type=='custom' tools are.
     - tool_choice.name is rewritten in sync (only when type=='tool').
     - Response side: bytes-level replace on every SSE chunk / JSON
       body at 6 injection points (standard stream/non-stream,
       passthrough stream/non-stream, chat_completions stream +
       non-stream, responses stream + non-stream). Reverse mapping
       applied longest-fake-name-first to prevent substring conflicts
       (parity with Parrot _restore_tool_names_in_chunk).
     - tool_choice is no longer unconditionally deleted in
       normalizeClaudeOAuthRequestBody — Parrot passes it through.

  E) tools[-1] cache_control breakpoint
     - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
       the last tool has no cache_control. Client-provided ttl is
       passed through unchanged (repo-wide policy).

  F) messages cache_control strategy
     - stripMessageCacheControl removes every client-provided
       messages[*].content[*].cache_control (multi-turn stability).
     - addMessageCacheBreakpoints then injects two stable breakpoints:
       (1) last message, and (2) second-to-last user turn when
       messages.length >= 4.
     - Combined with the system block breakpoint and tools[-1]
       breakpoint, this gives exactly the 4 breakpoints Anthropic
       allows per request.

Non-trivial implementation details to be aware of when rebasing:

  * Two new files, no upstream collision:
      gateway_tool_rewrite.go       (D + E algorithms)
      gateway_messages_cache.go     (F strip + breakpoints)
  * Two new feature calls bolted onto the tail of
    applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
    conflicts will be ~10 lines maximum.
  * Response-side injection points all wrap their existing write with
    reverseToolNamesIfPresent(c, ...), preserving original behavior
    when no mapping is stored (static prefix rollback still runs).
  * Non-stream chat/responses switched from c.JSON to
    json.Marshal + c.Data so bytes-level replace is possible.
  * Retry bodies (FilterThinkingBlocksForRetry,
    FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
    only prune blocks — they preserve the already-obfuscated tool
    names, so no extra mapping re-application is needed.

Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.

Tests (16 new):
  - buildDynamicToolMap stability + below-threshold guard
  - sanitizeToolName precedence (dynamic > static)
  - restoreToolNamesInBytes longest-first + static rollback
  - applyToolNameRewriteToBody skips server tools + syncs tool_choice
  - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
  - stripMessageCacheControl + addMessageCacheBreakpoints in the
    1/4/string-content cases + second-to-last user turn selection
  - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
  - fake name shape follows Parrot {prefix}{head3}{i:02d}
This commit is contained in:
keh4l
2026-04-24 21:24:58 +08:00
parent a25faecadd
commit 6e12578bc5
6 changed files with 698 additions and 11 deletions

View File

@@ -0,0 +1,313 @@
package service
import (
"fmt"
"hash/fnv"
"math/rand"
"sort"
"strings"
"github.com/Wei-Shaw/sub2api/internal/pkg/claude"
"github.com/tidwall/gjson"
"github.com/tidwall/sjson"
)
// toolNameRewriteKey 是 gin.Context 上存 ToolNameRewrite 映射的 key。
// 请求阶段写入,响应阶段读取,用于 bytes 级逆向还原假名 → 真名。
const toolNameRewriteKey = "claude_tool_name_rewrite"
// staticToolNameRewrites 是"静态前缀映射",与 Parrot src/transform/cc_mimicry.py
// TOOL_NAME_REWRITES 完全一致。只有以这些前缀开头的工具会被重写。
var staticToolNameRewrites = map[string]string{
"sessions_": "cc_sess_",
"session_": "cc_ses_",
}
// fakeToolNamePrefixes 是"动态映射"的前缀池,与 Parrot _FAKE_PREFIXES 一致。
// 当 tools 数量 > dynamicToolMapThreshold 时随机选用其中前缀生成可读假名。
var fakeToolNamePrefixes = []string{
"analyze_", "compute_", "fetch_", "generate_", "lookup_", "modify_",
"process_", "query_", "render_", "resolve_", "sync_", "update_",
"validate_", "convert_", "extract_", "manage_", "monitor_", "parse_",
"review_", "search_", "transform_", "handle_", "invoke_", "notify_",
}
// dynamicToolMapThreshold 与 Parrot 一致tools 数量超过 5 才启用动态映射。
// 少量工具不需要混淆(一般是 Claude Code 自己的核心工具 bash/edit/read 等)。
const dynamicToolMapThreshold = 5
// ToolNameRewrite 是单次请求内的工具名混淆映射。
// - Forward: real → fake请求阶段在 body 上应用。
// - Reverse: fake → real响应阶段对每个 chunk 做 bytes.Replace 还原。
//
// ReverseOrdered 是按假名长度倒序的 (fake, real) 列表,用于防止短假名是长假名的
// 子串时 bytes.Replace 先被吃掉(对齐 Parrot _restore_tool_names_in_chunk 的
// `sorted(..., key=lambda x: len(x[1]), reverse=True)`)。
type ToolNameRewrite struct {
Forward map[string]string
Reverse map[string]string
ReverseOrdered [][2]string
}
// buildDynamicToolMap 构造 tools 的动态假名映射。
//
// 与 Parrot _build_dynamic_tool_map 语义等价:
// - tools 数量 ≤ dynamicToolMapThreshold 时返回 nil不做动态映射走静态 fallback
// - 同一组 tool_names 在同进程内映射稳定(保证 cache 命中)
//
// Parrot 用 `random.Random(hash(tuple(tool_names)))` 作 seed + shuffle 前缀池;
// Go 无法字节级复刻 Python hash但"稳定性"和"前缀池打散"两个不变量都保留:
// 用 fnv64a(strings.Join(names, "\x00")) 作 seed 喂 math/rand.New。
// 字节级不同不影响上游判定Anthropic 不会验证我们的随机种子算法)。
func buildDynamicToolMap(toolNames []string) map[string]string {
if len(toolNames) <= dynamicToolMapThreshold {
return nil
}
h := fnv.New64a()
for i, n := range toolNames {
if i > 0 {
_, _ = h.Write([]byte{0})
}
_, _ = h.Write([]byte(n))
}
rng := rand.New(rand.NewSource(int64(h.Sum64())))
available := make([]string, len(fakeToolNamePrefixes))
copy(available, fakeToolNamePrefixes)
rng.Shuffle(len(available), func(i, j int) { available[i], available[j] = available[j], available[i] })
mapping := make(map[string]string, len(toolNames))
for i, name := range toolNames {
prefix := available[i%len(available)]
headLen := 3
if len(name) < 3 {
headLen = len(name)
}
fake := fmt.Sprintf("%s%s%02d", prefix, name[:headLen], i)
mapping[name] = fake
}
return mapping
}
// sanitizeToolName 把真名转成假名。
// 与 Parrot _sanitize_tool_name 语义一致:动态映射优先,再走静态前缀映射。
func sanitizeToolName(name string, dynamic map[string]string) string {
if dynamic != nil {
if fake, ok := dynamic[name]; ok {
return fake
}
}
for prefix, replacement := range staticToolNameRewrites {
if strings.HasPrefix(name, prefix) {
return replacement + name[len(prefix):]
}
}
return name
}
// shouldMimicToolName 指示某个 tool 是否需要重命名。
// server tooltype != "" 且不是 "function" / "custom")是 Anthropic 协议语义的一部分,
// 比如 "web_search_20250305" / "computer_20250124";误改会导致上游拒绝。
func shouldMimicToolName(toolType string) bool {
if toolType == "" || toolType == "function" || toolType == "custom" {
return true
}
return false
}
// buildToolNameRewriteFromBody 扫描 body 的 tools[*].name构造 ToolNameRewrite
// 并返回它。若不需要混淆tools 数量不足 + 没有匹配静态前缀的工具)返回 nil。
//
// 注意:只扫描,不改 body。真正的 body 改写在 applyToolNameRewriteToBody。
func buildToolNameRewriteFromBody(body []byte) *ToolNameRewrite {
tools := gjson.GetBytes(body, "tools")
if !tools.IsArray() {
return nil
}
mimicableNames := make([]string, 0)
toolsArr := tools.Array()
for _, t := range toolsArr {
if !shouldMimicToolName(t.Get("type").String()) {
continue
}
name := t.Get("name").String()
if name == "" {
continue
}
mimicableNames = append(mimicableNames, name)
}
dynamic := buildDynamicToolMap(mimicableNames)
rw := &ToolNameRewrite{
Forward: make(map[string]string),
Reverse: make(map[string]string),
}
for _, name := range mimicableNames {
fake := sanitizeToolName(name, dynamic)
if fake == name {
continue
}
rw.Forward[name] = fake
rw.Reverse[fake] = name
}
if len(rw.Forward) == 0 {
return nil
}
rw.ReverseOrdered = make([][2]string, 0, len(rw.Reverse))
for fake, real := range rw.Reverse {
rw.ReverseOrdered = append(rw.ReverseOrdered, [2]string{fake, real})
}
sort.SliceStable(rw.ReverseOrdered, func(i, j int) bool {
return len(rw.ReverseOrdered[i][0]) > len(rw.ReverseOrdered[j][0])
})
return rw
}
// applyToolNameRewriteToBody 把已构造的 ToolNameRewrite 应用到 body 上:
// - 改写 $.tools[*].name仅对 shouldMimicToolName 通过的 tool
// - 在 $.tools[last].cache_control 上打 ephemeral 缓存断点Parrot 行为对齐,
// ttl 客户端已有则透传,否则默认 claude.DefaultCacheControlTTL
// - 改写 $.tool_choice.name仅当 $.tool_choice.type == "tool"
//
// 历史 $.messages[*].content[*].nametool_use不在请求侧改写——这与 Parrot 一致;
// 响应侧 bytes.Replace 会连带还原它们。
func applyToolNameRewriteToBody(body []byte, rw *ToolNameRewrite) []byte {
if rw == nil || len(rw.Forward) == 0 {
body = applyToolsLastCacheBreakpoint(body)
return body
}
tools := gjson.GetBytes(body, "tools")
if tools.IsArray() {
idx := -1
tools.ForEach(func(_, t gjson.Result) bool {
idx++
if !shouldMimicToolName(t.Get("type").String()) {
return true
}
name := t.Get("name").String()
if name == "" {
return true
}
fake, ok := rw.Forward[name]
if !ok {
return true
}
if next, err := sjson.SetBytes(body, fmt.Sprintf("tools.%d.name", idx), fake); err == nil {
body = next
}
return true
})
}
if tc := gjson.GetBytes(body, "tool_choice"); tc.Exists() && tc.Get("type").String() == "tool" {
name := tc.Get("name").String()
if fake, ok := rw.Forward[name]; ok {
if next, err := sjson.SetBytes(body, "tool_choice.name", fake); err == nil {
body = next
}
}
}
body = applyToolsLastCacheBreakpoint(body)
return body
}
// applyToolsLastCacheBreakpoint 在 tools 数组最后一个工具上注入 cache_control
// 断点,对齐 Parrot `tools[-1]["cache_control"] = {"type":"ephemeral","ttl":"1h"}`
// 行为,但 ttl 按本仓规则:
// - 客户端已为该 tool 显式设置 cache_control.ttl → 完全透传不覆盖
// - 否则注入 {"type":"ephemeral","ttl": claude.DefaultCacheControlTTL}
//
// 纯副作用函数tools 不存在或为空数组时 no-op。
func applyToolsLastCacheBreakpoint(body []byte) []byte {
tools := gjson.GetBytes(body, "tools")
if !tools.IsArray() {
return body
}
arr := tools.Array()
if len(arr) == 0 {
return body
}
lastIdx := len(arr) - 1
existingCC := arr[lastIdx].Get("cache_control")
if existingCC.Exists() && existingCC.Get("ttl").String() != "" {
return body
}
if existingCC.Exists() {
if next, err := sjson.SetBytes(body, fmt.Sprintf("tools.%d.cache_control.ttl", lastIdx), claude.DefaultCacheControlTTL); err == nil {
body = next
}
return body
}
raw := fmt.Sprintf(`{"type":"ephemeral","ttl":%q}`, claude.DefaultCacheControlTTL)
if next, err := sjson.SetRawBytes(body, fmt.Sprintf("tools.%d.cache_control", lastIdx), []byte(raw)); err == nil {
body = next
}
return body
}
// restoreToolNamesInBytes 对 bytes chunk 做逆向还原:假名 → 真名。
// 按 ReverseOrdered 的假名长度倒序逐个 bytes.Replace防止子串冲突
// (与 Parrot _restore_tool_names_in_chunk 的 sorted(..., reverse=True) 等价)。
// 再做静态前缀还原cc_sess_ → sessions_ / cc_ses_ → session_
//
// rw 可为 nilnil 时仍会做静态前缀还原。
func restoreToolNamesInBytes(data []byte, rw *ToolNameRewrite) []byte {
if rw != nil {
for _, pair := range rw.ReverseOrdered {
fake, real := pair[0], pair[1]
if fake == "" || fake == real {
continue
}
data = replaceAllBytes(data, fake, real)
}
}
for prefix, replacement := range staticToolNameRewrites {
data = replaceAllBytes(data, replacement, prefix)
}
return data
}
// replaceAllBytes 是 bytes.ReplaceAll 的便捷封装,避免每个调用点各自做 []byte 转换。
func replaceAllBytes(data []byte, from, to string) []byte {
if len(data) == 0 || from == to || !strings.Contains(string(data), from) {
return data
}
return []byte(strings.ReplaceAll(string(data), from, to))
}
// toolNameRewriteFromContext 从 gin.Context 取出请求阶段保存的工具名映射。
// 找不到c==nil 或 key 不存在或类型不对)时返回 nil调用方必须能处理 nil。
func toolNameRewriteFromContext(c interface {
Get(string) (any, bool)
}) *ToolNameRewrite {
if c == nil {
return nil
}
raw, ok := c.Get(toolNameRewriteKey)
if !ok || raw == nil {
return nil
}
rw, _ := raw.(*ToolNameRewrite)
return rw
}
// reverseToolNamesIfPresent 是响应侧 5 处注入点的统一封装:从 c 取出 mapping
// 并对 chunk 做 bytes 级假名→真名替换。c 没有 mapping 时仍会做静态前缀还原。
func reverseToolNamesIfPresent(c interface {
Get(string) (any, bool)
}, chunk []byte) []byte {
rw := toolNameRewriteFromContext(c)
if rw == nil && len(staticToolNameRewrites) == 0 {
return chunk
}
return restoreToolNamesInBytes(chunk, rw)
}