Kiro backend does not support Anthropic prompt cache protocol.
The local cache tracker simulates cache hits/creation for Claude Code
compatibility, but subtracting those values from input_tokens caused
the reported input_tokens to drop to single digits.
input_tokens now reflects the real value; cache_creation_input_tokens
and cache_read_input_tokens are still reported for protocol compliance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the proxy's own --- SYSTEM PROMPT --- wrapper or Claude Code's
<system-reminder> blocks appear in conversation history (e.g. echoed
back by Kiro and included in the next request), strip them from user
and assistant message content before building the Kiro payload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Old format had 9+ lines per request with key=value noise, and concurrent
requests interleaved without any way to tell which line belongs to which
call. The new format:
- Each line starts with the status code/outcome (200, 400, 429, FAIL,
TIMEOUT, ERR) so success/failure is visible at a glance.
- Every request gets a 6-char hex req_id; all lines for that request
share it, disambiguating interleaved concurrent traffic.
- Endpoint abbreviated to 2 chars (CW/Q), model stripped of "claude-"
prefix, attempt compacted to "a1"/"a2".
- Successful requests collapse to 2 lines (REQ start + 200 done with
first_byte and total elapsed). Retries/errors add one line each.
- Durations use fmtMs: <1s -> "235ms", >=1s -> "2.8s" (one sig fig).
Sample successful request:
[KiroAPI] REQ a3f2b1 model=opus-4.7 account=x@y endpoints=CW,Q
[KiroAPI] 200 a3f2b1 CW/a1 first_byte=1.2s total=2.8s
Sample fallback chain:
[KiroAPI] REQ b8e3c4 model=opus-4.6 account=x@y endpoints=Q,CW
[KiroAPI] 400 b8e3c4 Q /a1 INVALID_MODEL_ID 325ms retry 1/3
[KiroAPI] 400 b8e3c4 Q /a2 INVALID_MODEL_ID 242ms retry 2/3
[KiroAPI] 400 b8e3c4 Q /a4 INVALID_MODEL_ID 216ms exhausted -> fallback
[KiroAPI] 400 b8e3c4 CW/a1 INVALID_MODEL_ID 452ms retry 1/3
...
[KiroAPI] FAIL b8e3c4 all endpoints failed 2.1s last=400
Upstream sometimes accepts a request (HTTP 200 headers) but stalls without
sending any event-stream packet. Add a configurable timeout that counts
from request dispatch until the first AWS event-stream prelude is read,
and retry on the same endpoint before falling back.
- Config: FirstByteTimeoutSec (default 10s, 0=disabled, range 0-300),
FirstByteRetries (default 1, range 0-10), with Get/Update helpers.
- kiro.go: parseEventStream signature gains onFirstByte callback, fired
once when the first 12-byte prelude reads successfully. CallKiroAPI
wraps each attempt in a context.WithCancel + time.AfterFunc timer that
cancels the HTTP request if no event arrives before the deadline.
Separate retry budgets for INVALID_MODEL_ID and first-byte timeout,
tracked on the same attempt loop; maxAttempts = max(both)+1.
- handler.go: /admin/api/general extended to read/write the two new
fields with validation (timeout 0-300, retries 0-10).
- web/index.html: General Settings card gains two numeric inputs plus
CN/EN i18n and the corresponding load/save JS.
Brought in 9 upstream commits:
- 221348b thinking routing: ClaudeRequest.Thinking + Signature + includeEmptyThinkingBlock
- 0203357 + 31aa6aa accurate input_tokens via contextUsageEvent
- 404e242 + 50f1a7e outbound proxy (socks5/http) + UI
- 940dc78 version bump to 1.0.6
- 3 CI workflow changes
Strategy: took upstream base for the 4 conflicting files, then re-applied
our local changes on top:
- config.go: InvalidModelRetries field + GetInvalidModelRetries/UpdateInvalidModelRetries
- kiro.go: AmazonQ origin CLI->AI_EDITOR, attempt-level retry loop for
INVALID_MODEL_ID, detailed log.Printf (account/model/attempt/elapsed),
log import; adopted upstream's kiroHttpStore atomic pointer for Do()
- handler.go: /admin/api/general GET/POST + apiGetGeneralConfig +
apiUpdateGeneralConfig
- web/index.html: General Settings card (invalid-model-retries),
CN/EN i18n, loadGeneralConfig/saveGeneralConfig, call from initSettings
Build + full test suite green on Go 1.24.3.
With origin=CLI, q.us-east-1.amazonaws.com returns only 3 base models
(sonnet-4.5, sonnet-4, haiku-4.5) and rejects everything else with
INVALID_MODEL_ID. With origin=AI_EDITOR it returns the full catalog
(opus-4.5/4.6/4.7, sonnet-4.6, haiku-4.5, deepseek, minimax, glm, qwen,
auto).
Verified via direct curl to /ListAvailableModels on both origin values
with two different tokens.
- Add missing claude-sonnet-4-7/4.7 and claude-haiku-4-7/4.7 mappings;
previously claude-sonnet-4.7 was substring-matched by the bare
"claude-sonnet-4" key and silently downgraded to claude-sonnet-4.
- Introduce modelMapping.boundary flag and modelKeyMatches() helper.
Bare digit-ending keys (like claude-sonnet-4) now require the next
character to NOT be a digit, dot, or dash-digit, so future versions
(4.8, 5.x) also pass through without silent downgrade.
- Add 8 regression tests in TestParseModelAndThinkingNoSilentDowngrade
covering the 4.7 family, hypothetical 4.8, Bedrock-style names, and
thinking-suffix variants.
- Config: new InvalidModelRetries field (default 3, range 0-20)
- Admin API: /admin/api/general GET/POST for general settings
- Admin UI: new "通用设置" card with retry count input
- CallKiroAPI: same-endpoint retry on HTTP 400 INVALID_MODEL_ID
before falling back to next endpoint
- CallKiroAPI: switched to log.Printf with timestamp, account,
model, attempt counter, elapsed time, error body truncation
* feat: Add validation and account management functionality
- Add validation for clientID and clientSecret in refreshOIDCToken function
- Add weight field for load balancing priority in Account struct
- Implement weighted轮询策略以根据账号权重分配选择概率。
- Add batch account management functionality including enabling, disabling, refreshing, and retrieving account details.
- Update Kiro API version and adjust user agent strings to reflect new version numbers.
- Update Kiro version and modify user agent strings and header settings.
- Refactor model mapping to an ordered list for precise key matching.
- Add account bulk actions and filtering toolbar to index.html
* feat: Add logic to skip accounts with exhausted usage limits
- Add logic to skip accounts with exhausted usage limits when selecting the next account.
* fix: stabilize multimodal image compatibility across OpenCode flows
Advertise vision-capable metadata in /v1/models and make model matching deterministic so OpenCode does not downgrade image support or route 4.6 models incorrectly. Expand request translation to accept OpenCode/OpenAI attachment shapes, sanitize [Image N] placeholders safely, keep image-only follow-up turns non-empty, and improve token accounting so base64 image bytes no longer inflate prompt token usage and trigger premature compaction.
* fix: deduplicate thinking streams and trim injected prompt noise
* fix: align /v1/messages thinking blocks and message_start usage
* fix: reduce repetitive thinking across tool turns
Select a single reasoning stream source, prevent chunk replay, and preserve structured tool-loop context so the model keeps continuity instead of re-planning each turn.
* fix: unify token counting on existing API endpoints
Compute usage deterministically on /v1/messages and /v1/chat/completions even when upstream omits tokenUsage.
- remove roo-only token path and keep behavior on existing endpoints
- add proxy/token_estimator.go with shared Claude/OpenAI estimators (input/system/messages/tools + output/thinking/tool calls)
- wire stream/non-stream handlers to use estimator-derived input/output usage
- update /v1/messages/count_tokens to reuse the same estimator
- keep robust upstream usage parsing/normalization in proxy/kiro.go while dropping parser-level estimate fallback
Why: direct upstream tests show metering/context events frequently arrive without tokenUsage in this environment; this made usage zero or inconsistent. Local deterministic accounting keeps reported usage stable and explicit.
- Add claude-sonnet-4.6 (dot and dash variants) to modelMap in translator.go
- Add claude-sonnet-4.6 and claude-opus-4.6 (plus -thinking variants) to the
static fallback model list in handler.go
- Realign existing opus-4.6 entries for consistency
* feat: Add JSON copy functionality with success animation
- Add functionality to copy account data as JSON and show success animation.
* feat: Add endpoints for account details and error handling
- Add endpoint to retrieve full account details including sensitive information
- Add error handling for fetching and copying full account JSON data
- Add ban status and reason fields to account configuration
- Add account ban status and details handling in API refresh account function.
- Add logic to handle account suspension and authentication errors, updating ban status accordingly.
- Add and style badge classes for different account statuses and modify account status display logic.