fix(antigravity): fast-fail on proxy unavailable, temp-unschedule account

## Problem

When a proxy is unreachable, token refresh retries up to 4 times with
30s timeout each, causing requests to hang for ~2 minutes before
failing with a generic 502 error. The failed account is not marked,
so subsequent requests keep hitting it.

## Changes

### Proxy connection fast-fail
- Set TCP dial timeout to 5s and TLS handshake timeout to 5s on
  antigravity client, so proxy connectivity issues fail within 5s
  instead of 30s
- Reduce overall HTTP client timeout from 30s to 10s
- Export `IsConnectionError` for service-layer use
- Detect proxy connection errors in `RefreshToken` and return
  immediately with "proxy unavailable" error (no retries)

### Token refresh temp-unschedulable
- Add 8s context timeout for token refresh on request path
- Mark account as temp-unschedulable for 10min when refresh fails
  (both background `TokenRefreshService` and request-path
  `GetAccessToken`)
- Sync temp-unschedulable state to Redis cache for immediate
  scheduler effect
- Inject `TempUnschedCache` into `AntigravityTokenProvider`

### Account failover
- Return `UpstreamFailoverError` on `GetAccessToken` failure in
  `Forward`/`ForwardGemini` to trigger handler-level account switch
  instead of returning 502 directly

### Proxy probe alignment
- Apply same 5s dial/TLS timeout to shared `httpclient` pool
- Reduce proxy probe timeout from 30s to 10s
This commit is contained in:
erio
2026-03-19 23:48:37 +08:00
parent 0236b97d49
commit 528ff5d28c
10 changed files with 125 additions and 20 deletions

View File

@@ -17,6 +17,7 @@ package httpclient
import (
"fmt"
"net"
"net/http"
"strings"
"sync"
@@ -32,6 +33,8 @@ const (
defaultMaxIdleConns = 100 // 最大空闲连接数
defaultMaxIdleConnsPerHost = 10 // 每个主机最大空闲连接数
defaultIdleConnTimeout = 90 * time.Second // 空闲连接超时时间(建议小于上游 LB 超时)
defaultDialTimeout = 5 * time.Second // TCP 连接超时(含代理握手),代理不通时快速失败
defaultTLSHandshakeTimeout = 5 * time.Second // TLS 握手超时
validatedHostTTL = 30 * time.Second // DNS Rebinding 校验缓存 TTL
)
@@ -107,6 +110,10 @@ func buildTransport(opts Options) (*http.Transport, error) {
}
transport := &http.Transport{
DialContext: (&net.Dialer{
Timeout: defaultDialTimeout,
}).DialContext,
TLSHandshakeTimeout: defaultTLSHandshakeTimeout,
MaxIdleConns: maxIdleConns,
MaxIdleConnsPerHost: maxIdleConnsPerHost,
MaxConnsPerHost: opts.MaxConnsPerHost, // 0 表示无限制