- Updated TokenAuth middleware to handle requests for both `/v1beta/models/` and `/v1/models/`.
- Adjusted distributor middleware to recognize the new model path.
- Enhanced relay mode determination to include the new model path.
- Added route for handling POST requests to `/models/*path`.
These changes ensure compatibility with the new model API structure, improving the overall routing and authentication flow.
Summary
• Migrated all ratio-related sources into `setting/ratio_setting/`
– `model_ratio.go` (renamed from model-ratio.go)
– `cache_ratio.go`
– `group_ratio.go`
• Changed package name to `ratio_setting` and relocated initialization (`ratio_setting.InitRatioSettings()` in main).
• Updated every import & call site:
– Model / cache / completion / image ratio helpers
– Group ratio helpers (`GetGroupRatio*`, `ContainsGroupRatio`, `CheckGroupRatio`, etc.)
– JSON-serialization & update helpers (`*Ratio2JSONString`, `Update*RatioByJSONString`)
• Adjusted controllers, middleware, relay helpers, services and models to reference the new package.
• Removed obsolete `setting` / `operation_setting` imports; added missing `ratio_setting` imports.
• Adopted idiomatic map iteration (`for key := range m`) where value is unused.
• Ran static checks to ensure clean build.
This commit centralises all ratio configuration (model, cache and group) in one cohesive module, simplifying future maintenance and improving code clarity.
- Introduced a new constant `AzureNoRemoveDotTime` in `constant/azure.go` to manage model name formatting for channels created after May 10, 2025.
- Updated `distributor.go` to set `channel_create_time` in the context.
- Modified `adaptor.go` to conditionally remove dots from model names based on the channel creation time.
- Enhanced `relay_info.go` to include `ChannelCreateTime` in the `RelayInfo` struct.
- Updated English localization files to reflect changes in model name handling for new channels.
Reason: The original steps 1 and 3 in the redisRateLimitHandler method were not atomic, leading to poor precision under high concurrent requests. For example, with a rate limit set to 60, sending 200 concurrent requests would result in none being blocked, whereas theoretically around 140 should be intercepted.
Solution: I chose not to merge steps 1 and 3 into a single Lua script because a single atomic operation involving read, write, and delete operations could suffer from performance issues under high concurrency. Instead, I implemented a token bucket algorithm to optimize this, reducing the atomic operation to just read and write steps while significantly decreasing the memory footprint.
- Add new context keys for user-related information
- Modify user cache and authentication middleware to populate context
- Refactor quota and notification services to use context-based user data
- Remove redundant database queries by leveraging context information
- Update various components to use new context-based user retrieval methods
- Replaced references to common.GroupRatio and common.UserUsableGroups with corresponding functions from the new setting package across multiple controllers and services.
- Introduced new setting functions for managing group ratios and user usable groups, enhancing code organization and maintainability.
- Updated related functions to ensure consistent behavior with the new setting package integration.
- Introduced a new constant `ContextKeyRequestStartTime` to store the request start time in the context, enhancing request tracking.
- Updated the `Distribute` middleware to set the request start time in the context using the new constant.
- Modified the `GenRelayInfo` function to retrieve the request start time from the context, ensuring accurate timing information is used in relay operations.