Go to file

长安 6e3bc06fa6 fix: 修复 Anthropic 渠道缓存计费错误

## 问题描述

当使用 Anthropic 渠道通过 `/v1/chat/completions` 端点调用且启用缓存功能时，
计费逻辑错误地减去了缓存 tokens，导致严重的收入损失（94.5%）。

## 根本原因

不同 API 的 `prompt_tokens` 定义不同：

- **Anthropic API**: `input_tokens` 字段已经是纯输入 tokens（不包含缓存）
- **OpenAI API**: `prompt_tokens` 字段包含所有 tokens（包含缓存）
- **OpenRouter API**: `prompt_tokens` 字段包含所有 tokens（包含缓存）

当前 `postConsumeQuota` 函数对所有渠道都减去缓存 tokens，这对 Anthropic
渠道是错误的，因为其 `input_tokens` 已经不包含缓存。

## 修复方案

在 `relay/compatible_handler.go` 的 `postConsumeQuota` 函数中，添加渠道类型判断：

```go
if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
    baseTokens = baseTokens.Sub(dCacheTokens)
}
```

只对非 Anthropic 渠道减去缓存 tokens。

## 影响分析

### ✅ 不受影响的场景

1. **无缓存调用**（所有渠道）
   - cache_tokens = 0
   - 减去 0 = 不减去
   - 结果：完全一致

2. **OpenAI/OpenRouter 渠道 + 缓存**
   - 继续减去缓存（因为 ChannelType != Anthropic）
   - 结果：完全一致

3. **Anthropic 渠道 + /v1/messages 端点**
   - 使用 PostClaudeConsumeQuota（不修改）
   - 结果：完全不受影响

### ✅ 修复的场景

4. **Anthropic 渠道 + /v1/chat/completions + 缓存**
   - 修复前：错误地减去缓存，导致 94.5% 收入损失
   - 修复后：不减去缓存，计费正确

## 验证数据

以实际记录 143509 为例：

| 项目 | 修复前 | 修复后 | 差异 |
|------|--------|--------|------|
| Quota | 10,489 | 191,330 | +180,841 |
| 费用 | ¥0.020978 | ¥0.382660 | +¥0.361682 |
| 收入恢复 | - | - | **+1724.1%** |

## 测试建议

1. 测试 Anthropic 渠道 + 缓存场景
2. 测试 OpenAI 渠道 + 缓存场景（确保不受影响）
3. 测试无缓存场景（确保不受影响）

## 相关 Issue

修复 Anthropic 渠道使用 prompt caching 时的计费错误。

2025-12-20 14:17:12 +08:00

.github

fix: release workflow show version

2025-11-22 20:06:13 +08:00

bin

chore: add model parameter to the time_test script (#245 )

2023-07-04 18:13:59 +08:00

common

🧹 fix: harden request-body size handling and error unwrapping

2025-12-16 18:10:00 +08:00

constant

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

controller

🧹 fix: harden request-body size handling and error unwrapping

2025-12-16 18:10:00 +08:00

docs

chore: update the relay openapi file

2025-12-02 18:17:01 +08:00

dto

feat(audio): enhance audio request handling with token type detection and streaming support

2025-12-13 17:24:23 +08:00

electron

chore(deps-dev): bump js-yaml from 4.1.0 to 4.1.1 in /electron

2025-11-15 20:14:28 +00:00

logger

fix: logger

2025-11-05 14:49:55 +08:00

middleware

🧹 fix: harden request-body size handling and error unwrapping

2025-12-16 18:10:00 +08:00

model

feat(auth): enhance IP restriction handling with CIDR support

2025-12-15 17:24:09 +08:00

relay

fix: 修复 Anthropic 渠道缓存计费错误

2025-12-20 14:17:12 +08:00

router

Merge pull request #2412 from seefs001/pr-2372

2025-12-11 23:35:23 +08:00

service

refactor(channel_select): improve retry logic with reset functionality

2025-12-13 18:09:10 +08:00

setting

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

types

🧹 fix: harden request-body size handling and error unwrapping

2025-12-16 18:10:00 +08:00

web

chore: add code-inspector-plugin integration

2025-12-19 23:04:53 +08:00

.dockerignore

fix: legal setting

2025-10-10 13:18:26 +08:00

.env.example

fix: Update GET_MEDIA_TOKEN_NOT_STREAM default value to false

2025-11-22 16:23:37 +08:00

.gitignore

feat(playground): enhance SSE debugging and add image paste support with i18n

2025-11-26 20:40:32 +08:00

docker-compose.yml

feat: add Umami and Google Analytics integration

2025-10-14 14:19:49 +08:00

Dockerfile

fix: health check

2025-12-12 20:37:32 +08:00

go.mod

Merge pull request #2358 from seefs001/fix/regrex-repeat-compile

2025-12-09 14:01:07 +08:00

go.sum

Merge pull request #2358 from seefs001/fix/regrex-repeat-compile

2025-12-09 14:01:07 +08:00

LICENSE

⚖️ docs(LICENSE): update license information from Apache 2.0 to New API Licensing

2025-07-20 16:15:00 +08:00

main.go

feat: add Umami and Google Analytics integration

2025-10-14 14:19:49 +08:00

makefile

feat: use bun when develop locally

2025-06-09 14:57:01 +08:00

new-api.service

format: package name -> github.com/QuantumNous/new-api (#2017 )

2025-10-11 15:30:09 +08:00

README.en.md

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

README.fr.md

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

README.ja.md

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

README.md

🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled

2025-12-16 17:00:19 +08:00

VERSION

fix: add a blank VERSION file (#135 )

2023-06-02 14:20:40 +08:00

README.en.md

New API

🍥 Next-Generation Large Model Gateway and AI Asset Management System

中文 | English | Français | 日本語

Quick Start • Key Features • Deployment • Documentation • Help

📝 Project Description

Note

This is an open-source project developed based on One API

Important

This project is for personal learning purposes only, with no guarantee of stability or technical support

Users must comply with OpenAI's Terms of Use and applicable laws and regulations, and must not use it for illegal purposes

According to the 《Interim Measures for the Management of Generative Artificial Intelligence Services》, please do not provide any unregistered generative AI services to the public in China.

🤝 Trusted Partners

No particular order

🙏 Special Thanks

Thanks to JetBrains for providing free open-source development license for this project

🚀 Quick Start

Using Docker Compose (Recommended)

# Clone the project
git clone https://github.com/QuantumNous/new-api.git
cd new-api

# Edit docker-compose.yml configuration
nano docker-compose.yml

# Start the service
docker-compose up -d

Using Docker Commands

# Pull the latest image
docker pull calciumion/new-api:latest

# Using SQLite (default)
docker run --name new-api -d --restart always \
  -p 3000:3000 \
  -e TZ=Asia/Shanghai \
  -v ./data:/data \
  calciumion/new-api:latest

# Using MySQL
docker run --name new-api -d --restart always \
  -p 3000:3000 \
  -e SQL_DSN="root:123456@tcp(localhost:3306)/oneapi" \
  -e TZ=Asia/Shanghai \
  -v ./data:/data \
  calciumion/new-api:latest

💡 Tip: -v ./data:/data will save data in the data folder of the current directory, you can also change it to an absolute path like -v /your/custom/path:/data

🎉 After deployment is complete, visit http://localhost:3000 to start using!

📖 For more deployment methods, please refer to Deployment Guide

📚 Documentation

📖 Official Documentation |

Quick Navigation:

Category	Link
🚀 Deployment Guide	Installation Documentation
⚙️ Environment Configuration	Environment Variables
📡 API Documentation	API Documentation
❓ FAQ	FAQ
💬 Community Interaction	Communication Channels

✨ Key Features

For detailed features, please refer to Features Introduction

🎨 Core Functions

Feature	Description
🎨 New UI	Modern user interface design
🌍 Multi-language	Supports Chinese, English, French, Japanese
🔄 Data Compatibility	Fully compatible with the original One API database
📈 Data Dashboard	Visual console and statistical analysis
🔒 Permission Management	Token grouping, model restrictions, user management

💰 Payment and Billing

✅ Online recharge (EPay, Stripe)
✅ Pay-per-use model pricing
✅ Cache billing support (OpenAI, Azure, DeepSeek, Claude, Qwen and all supported models)
✅ Flexible billing policy configuration

🔐 Authorization and Security

😈 Discord authorization login
🤖 LinuxDO authorization login
📱 Telegram authorization login
🔑 OIDC unified authentication

🚀 Advanced Features

API Format Support:

⚡ OpenAI Responses
⚡ OpenAI Realtime API (including Azure)
⚡ Claude Messages
⚡ Google Gemini
🔄 Rerank Models (Cohere, Jina)

Intelligent Routing:

⚖️ Channel weighted random
🔄 Automatic retry on failure
🚦 User-level model rate limiting

Format Conversion:

🔄 OpenAI ⇄ Claude Messages
🔄 OpenAI ⇄ Gemini Chat
🔄 Thinking-to-content functionality

Reasoning Effort Support:

View detailed configuration

OpenAI series models:

o3-mini-high - High reasoning effort
o3-mini-medium - Medium reasoning effort
o3-mini-low - Low reasoning effort
gpt-5-high - High reasoning effort
gpt-5-medium - Medium reasoning effort
gpt-5-low - Low reasoning effort

Claude thinking models:

claude-3-7-sonnet-20250219-thinking - Enable thinking mode

Google Gemini series models:

gemini-2.5-flash-thinking - Enable thinking mode
gemini-2.5-flash-nothinking - Disable thinking mode
gemini-2.5-pro-thinking - Enable thinking mode
gemini-2.5-pro-thinking-128 - Enable thinking mode with thinking budget of 128 tokens
You can also append -low, -medium, or -high to any Gemini model name to request the corresponding reasoning effort (no extra thinking-budget suffix needed).

🤖 Model Support

For details, please refer to API Documentation - Relay Interface

Model Type	Description	Documentation
🤖 OpenAI GPTs	gpt-4-gizmo-* series	-
🎨 Midjourney-Proxy	Midjourney-Proxy(Plus)	Documentation
🎵 Suno-API	Suno API	Documentation
🔄 Rerank	Cohere, Jina	Documentation
💬 Claude	Messages format	Documentation
🌐 Gemini	Google Gemini format	Documentation
🔧 Dify	ChatFlow mode	-
🎯 Custom	Supports complete call address	-

📡 Supported Interfaces

View complete interface list

🚢 Deployment

Tip

Latest Docker image: calciumion/new-api:latest

📋 Deployment Requirements

Component	Requirement
Local database	SQLite (Docker must mount `/data` directory)
Remote database	MySQL ≥ 5.7.8 or PostgreSQL ≥ 9.6
Container engine	Docker / Docker Compose

⚙️ Environment Variable Configuration

Common environment variable configuration

Variable Name	Description	Default Value
`SESSION_SECRET`	Session secret (required for multi-machine deployment)	-
`CRYPTO_SECRET`	Encryption secret (required for Redis)	-
`SQL_DSN`	Database connection string	-
`REDIS_CONN_STRING`	Redis connection string	-
`STREAMING_TIMEOUT`	Streaming timeout (seconds)	`300`
`STREAM_SCANNER_MAX_BUFFER_MB`	Max per-line buffer (MB) for the stream scanner; increase when upstream sends huge image/base64 payloads	`64`
`MAX_REQUEST_BODY_MB`	Max request body size (MB, counted after decompression; prevents huge requests/zip bombs from exhausting memory). Exceeding it returns `413`	`32`
`AZURE_DEFAULT_API_VERSION`	Azure API version	`2025-04-01-preview`
`ERROR_LOG_ENABLED`	Error log switch	`false`

📖 Complete configuration: Environment Variables Documentation

🔧 Deployment Methods

Method 1: Docker Compose (Recommended)

# Clone the project
git clone https://github.com/QuantumNous/new-api.git
cd new-api

# Edit configuration
nano docker-compose.yml

# Start service
docker-compose up -d

Method 2: Docker Commands

Using SQLite:

docker run --name new-api -d --restart always \
  -p 3000:3000 \
  -e TZ=Asia/Shanghai \
  -v ./data:/data \
  calciumion/new-api:latest

Using MySQL:

docker run --name new-api -d --restart always \
  -p 3000:3000 \
  -e SQL_DSN="root:123456@tcp(localhost:3306)/oneapi" \
  -e TZ=Asia/Shanghai \
  -v ./data:/data \
  calciumion/new-api:latest

💡 Path explanation:

./data:/data - Relative path, data saved in the data folder of the current directory

You can also use absolute path, e.g.: /your/custom/path:/data

Method 3: BaoTa Panel

Install BaoTa Panel (≥ 9.2.0 version)
Search for New-API in the application store
One-click installation

📖 Tutorial with images

⚠️ Multi-machine Deployment Considerations

Warning

Must set SESSION_SECRET - Otherwise login status inconsistent

Shared Redis must set CRYPTO_SECRET - Otherwise data cannot be decrypted

🔄 Channel Retry and Cache

Retry configuration: Settings → Operation Settings → General Settings → Failure Retry Count

Cache configuration:

REDIS_CONN_STRING: Redis cache (recommended)
MEMORY_CACHE_ENABLED: Memory cache

Upstream Projects

Project	Description
One API	Original project base
Midjourney-Proxy	Midjourney interface support

Supporting Tools

Project	Description
neko-api-key-tool	Key quota query tool
new-api-horizon	New API high-performance optimized version

💬 Help Support

📖 Documentation Resources

Resource	Link
📘 FAQ	FAQ
💬 Community Interaction	Communication Channels
🐛 Issue Feedback	Issue Feedback
📚 Complete Documentation	Official Documentation

🤝 Contribution Guide

Welcome all forms of contribution!

🐛 Report Bugs
💡 Propose New Features
📝 Improve Documentation
🔧 Submit Code

🌟 Star History

💖 Thank you for using New API

If this project is helpful to you, welcome to give us a ⭐️ Star！

Official Documentation • Issue Feedback • Latest Release

_{Built with ❤️ by QuantumNous}

README.en.md Unescape Escape

New API

📝 Project Description

🤝 Trusted Partners

🙏 Special Thanks

🚀 Quick Start

Using Docker Compose (Recommended)

📚 Documentation

📖 Official Documentation |

✨ Key Features

🎨 Core Functions

💰 Payment and Billing

🔐 Authorization and Security

🚀 Advanced Features

🤖 Model Support

📡 Supported Interfaces

🚢 Deployment

📋 Deployment Requirements

⚙️ Environment Variable Configuration

🔧 Deployment Methods

⚠️ Multi-machine Deployment Considerations

🔄 Channel Retry and Cache

🔗 Related Projects

Upstream Projects

Supporting Tools

💬 Help Support

📖 Documentation Resources

🤝 Contribution Guide

🌟 Star History

💖 Thank you for using New API

README.en.md