QTom
72e5876c64
feat(gateway): Cache-Driven RPM Buffer
...
- buffer 公式从 baseRPM/5 改为 concurrency + maxSessions
保留 baseRPM/5 作为 floor 向后兼容
- 粘性路径 fallback 新增 [StickyCacheMiss] 结构化日志
reason: rpm_red / gate_check / session_limit / wait_queue_full / account_cleared
- session_limit 路径跳过 wait queue 重试(RegisterSession 拒绝无副作用)
- 典型配置 buffer 从 3 提升至 13,大幅减少高峰期 Prompt Cache Miss
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 13:24:22 +08:00
QTom
e63c83955a
fix: address deep code review issues for RPM limiting
...
- Move IncrementRPM after Forward success to prevent phantom RPM
consumption during account switch retries
- Add base_rpm input sanitization (clamp to 0-10000) in Create/Update
- Add WindowCost scheduling checks to legacy path sticky sessions
(4 check sites + 4 prefetch sites), fixing pre-existing gap
- Clean up rpm_strategy/rpm_sticky_buffer when disabling RPM in
BulkEditModal (JSONB merge cannot delete keys, use empty values)
- Add json.Number test cases to TestGetBaseRPM/TestGetRPMStickyBuffer
- Document TOCTOU race as accepted soft-limit design trade-off
2026-02-28 20:38:06 +08:00
QTom
607237571f
fix: address code review issues for RPM limiting feature
...
- Use TxPipeline (MULTI/EXEC) instead of Pipeline for atomic INCR+EXPIRE
- Filter negative values in GetBaseRPM(), update test expectation
- Add RPM batch query (GetRPMBatch) to account List API
- Add warn logs for RPM increment failures in gateway handler
- Reset enableRpmLimit on BulkEditAccountModal close
- Use union type 'tiered' | 'sticky_exempt' for rpmStrategy refs
- Add design decision comments for rdb.Time() RTT trade-off
2026-02-28 20:37:37 +08:00
QTom
0bb3e4a98c
feat: add RPM getter methods and schedulability check to Account model
2026-02-28 20:34:22 +08:00