Commit Graph

3168 Commits

Author SHA1 Message Date
Wesley Liddick
b95ffce244 Merge pull request #1772 from KnowSky404/fix/openai-test-state-reconciliation
[codex] reconcile OpenAI admin test rate-limit state
2026-04-25 10:02:21 +08:00
shaw
8f28a834f8 fix(payment): 同时启用易支付和 Stripe 时显示 Stripe 按钮
VISIBLE_METHOD_ALIASES 漏了 stripe,导致 getVisibleMethods 把后端返回
的 stripe 过滤掉。点 Stripe 按钮时省略 method 查询参数,让落地页渲染
完整的 Payment Element。
2026-04-25 09:46:27 +08:00
shaw
7424c73b05 chore: remove unused model IDs 2026-04-25 09:04:34 +08:00
Wesley Liddick
1afd81b019 Merge pull request #1920 from Wuxie233/fix/responses-web-search-tool-types
fix(apicompat): recognize web_search_20250305 / google_search in Responses→Anthropic tool conversion
2026-04-25 09:00:37 +08:00
shaw
732d6495ea chore(gateway): fix lint issues from cc-mimicry-parity merge
- staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
  passthrough guard (`!(a && b)` → `a != ... || !b`).
- unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
  switched both `/v1/messages` and `/count_tokens` paths to unconditional
  `account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
  is kept (still referenced by `TestIsClaudeCodeClient`).
2026-04-25 08:58:57 +08:00
Wesley Liddick
6d20ab8082 Merge pull request #1914 from keh4l/feat/cc-mimicry-parity
fix(claude): align Claude Code OAuth mimicry with real CLI traffic
2026-04-25 08:54:04 +08:00
shaw
aa8ee33b0a refactor(affiliate): tighten DI and harden inviter code validation
- Drop SetAffiliateService setters and ProvideAuthService /
  ProvidePaymentService / ProvideUserHandler wrappers in favor of direct
  Wire constructor injection. AffiliateService has no back-edge to
  Auth/Payment/User, so the indirection was never required.
- Change RegisterWithVerification's variadic affiliateCode to a fixed
  parameter; adjust all call sites.
- Validate aff_code length and charset in BindInviterByCode before any
  DB lookup, eliminating timing-side-channel and useless DB roundtrips
  on malformed input.
- Make affiliate cache invalidation synchronous; surface Redis errors
  via the project logger instead of swallowing them in a detached
  goroutine.
- Add an integration test guarding cross-layer tx propagation in
  AccrueQuota and a unit test pinning the aff_code format rules.
2026-04-25 08:44:18 +08:00
Wuxie233
5f630fbb19 fix(apicompat): recognize web_search_20250305 / google_search in Responses to Anthropic tool conversion 2026-04-25 01:09:51 +08:00
keh4l
bdbd2916f5 fix(gateway): skip client header passthrough on OAuth mimicry path
Root cause of persistent third-party detection: sub2api's
buildUpstreamRequest transparently forwards client headers via
allowedHeaders whitelist (addHeaderRaw) before applying mimicry
overrides. When third-party clients (opencode, etc.) send their own
anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
values, these get appended to the request alongside our injected
headers, creating an inconsistent header set that Anthropic detects.

Parrot's build_upstream_headers constructs exactly 9 headers from
scratch and never forwards anything from the client. This is why
'same opencode version, some users work some don't' — different
opencode configs/versions send different header combinations.

Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
client header passthrough loop entirely. The subsequent
applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
pipeline constructs all necessary headers from our controlled values.

Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
now unconditionally rewrite system (even if client already sent a
Claude Code-style prompt), ensuring billing attribution block is
always present.
2026-04-25 00:43:38 +08:00
keh4l
6dc89765fd fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity
Before: isClaudeCodeRequest() checked whether the client looks like a
real Claude Code CLI (UA, system prompt, X-App header, metadata format).
If it looked like Claude Code, all mimicry was skipped — the assumption
being that a real CLI needs no help.

Problem: third-party tools like opencode partially impersonate Claude
Code (sending claude-cli UA + claude-code beta + CC system prompt) but
miss critical details (billing attribution block, tool-name obfuscation,
cache breakpoints, full beta set). Some users' opencode instances pass
the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
while Anthropic still detects the request as third-party.

This explains why 'same opencode version, some users work, some don't'
— it depends on which opencode features/config trigger the validator.

Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
matching Parrot's behavior (Parrot never checks client identity).
This is safe because our mimicry is strictly more complete than any
third-party client's partial impersonation.

Changed:
  - /v1/messages path: remove isClaudeCode gate
  - /v1/messages/count_tokens path: same
2026-04-25 00:26:37 +08:00
keh4l
f3233db01f fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths
The previous commit only wired stripMessageCacheControl,
addMessageCacheBreakpoints, and tool-name obfuscation into
applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
/responses). The native /v1/messages path and count_tokens path
have their own independent mimicry code blocks and were missed.

Now all three entry points share the same D/E/F pipeline:
  - /v1/messages (gateway_service.go forwardAnthropic)
  - /v1/messages/count_tokens (gateway_service.go countTokens)
  - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
2026-04-24 23:16:32 +08:00
keh4l
6e12578bc5 feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints
Implements the remaining three parity items with Parrot cc_mimicry:

  D) Tool-name obfuscation
     - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
       Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
       Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
       math/rand; byte-exact reproduction is impossible (Python hash vs
       Go hash), but the two invariants that matter are preserved:
         * same input tool_names yield identical mapping (cache hit)
         * prefix pool is shuffled (names look distributed)
     - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
       applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
     - Server tools (web_search_20250305, computer_*, etc.) are NOT
       renamed; only type=='function' and type=='custom' tools are.
     - tool_choice.name is rewritten in sync (only when type=='tool').
     - Response side: bytes-level replace on every SSE chunk / JSON
       body at 6 injection points (standard stream/non-stream,
       passthrough stream/non-stream, chat_completions stream +
       non-stream, responses stream + non-stream). Reverse mapping
       applied longest-fake-name-first to prevent substring conflicts
       (parity with Parrot _restore_tool_names_in_chunk).
     - tool_choice is no longer unconditionally deleted in
       normalizeClaudeOAuthRequestBody — Parrot passes it through.

  E) tools[-1] cache_control breakpoint
     - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
       the last tool has no cache_control. Client-provided ttl is
       passed through unchanged (repo-wide policy).

  F) messages cache_control strategy
     - stripMessageCacheControl removes every client-provided
       messages[*].content[*].cache_control (multi-turn stability).
     - addMessageCacheBreakpoints then injects two stable breakpoints:
       (1) last message, and (2) second-to-last user turn when
       messages.length >= 4.
     - Combined with the system block breakpoint and tools[-1]
       breakpoint, this gives exactly the 4 breakpoints Anthropic
       allows per request.

Non-trivial implementation details to be aware of when rebasing:

  * Two new files, no upstream collision:
      gateway_tool_rewrite.go       (D + E algorithms)
      gateway_messages_cache.go     (F strip + breakpoints)
  * Two new feature calls bolted onto the tail of
    applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
    conflicts will be ~10 lines maximum.
  * Response-side injection points all wrap their existing write with
    reverseToolNamesIfPresent(c, ...), preserving original behavior
    when no mapping is stored (static prefix rollback still runs).
  * Non-stream chat/responses switched from c.JSON to
    json.Marshal + c.Data so bytes-level replace is possible.
  * Retry bodies (FilterThinkingBlocksForRetry,
    FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
    only prune blocks — they preserve the already-obfuscated tool
    names, so no extra mapping re-application is needed.

Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.

Tests (16 new):
  - buildDynamicToolMap stability + below-threshold guard
  - sanitizeToolName precedence (dynamic > static)
  - restoreToolNamesInBytes longest-first + static rollback
  - applyToolNameRewriteToBody skips server tools + syncs tool_choice
  - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
  - stripMessageCacheControl + addMessageCacheBreakpoints in the
    1/4/string-content cases + second-to-last user turn selection
  - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
  - fake name shape follows Parrot {prefix}{head3}{i:02d}
2026-04-24 23:16:32 +08:00
keh4l
a25faecadd feat(gateway): align body shape with real Claude Code CLI defaults
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:

  1. temperature: previously deleted unconditionally; now passes
     through client value, defaults to 1 when absent (real CLI
     always sends temperature, default 1).

  2. max_tokens: defaults to 128000 when absent (real CLI default).

  3. context_management: when thinking.type is enabled/adaptive
     and the client did not provide context_management, inject
     {"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
     to mirror real CLI behavior.

tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).

Tests updated:
  - gateway_body_order_test.go: temperature/max_tokens are now
    expected in output; tool_choice still removed.
  - gateway_prompt_test.go: system array is now 2 blocks
    (billing + cc prompt), assertions adjusted.
  - gateway_anthropic_apikey_passthrough_test.go: same 2-block
    assertion.
2026-04-24 23:16:32 +08:00
keh4l
5862e2d8d9 feat(gateway): add billing attribution block with cc_version fingerprint
Real Claude Code CLI always sends a 2-block system array:

  [0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
  [1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}

Before this commit, sub2api's mimicry path only produced block [1].
The missing billing block is one of the primary third-party detection
signals Anthropic uses for Claude-Code-scoped OAuth tokens.

New file gateway_billing_block.go ports the fingerprint algorithm
(byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
pick chars at positions [4,7,20] of the first user text, then
`sha256(SALT + chars + cc_version)[:3]`.

  - claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
  - gateway_billing_block.go: computeClaudeCodeFingerprint +
    buildBillingAttributionBlockJSON + extractFirstUserText
  - gateway_service.go: rewriteSystemForNonClaudeCode now emits both
    blocks in order; cch=00000 is filled in later by
    signBillingHeaderCCH in buildUpstreamRequest.

Downstream compat note: syncBillingHeaderVersion's regex
`cc_version=\d+\.\d+\.\d+` only matches the semver triple,
leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
2026-04-24 23:16:32 +08:00
keh4l
66d6454535 feat(claude): add ttl to cache_control with default 5m
Real Claude CLI traffic sends cache_control as
`{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
the official CLI and one more third-party detection signal.

Policy: client-provided ttl is always passed through unchanged.
Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
to avoid burning the 1h cache budget on automatic breakpoints while
still aligning with the `ttl` field being present.

  - claude/constants.go: DefaultCacheControlTTL = "5m"
  - apicompat/types.go: new AnthropicCacheControl type with TTL field;
    AnthropicTool gains optional CacheControl pointer so the mimicry
    path can attach a cache breakpoint to tools[-1] later.
  - service/gateway_service.go: anthropicCacheControlPayload gains TTL;
    marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
    emit ttl=5m by default.
2026-04-24 23:16:32 +08:00
keh4l
165553cfb0 fix(gateway): use full beta list in buildUpstreamRequest mimicry path
The previous commit added FullClaudeCodeMimicryBetas() but the two
call sites in buildUpstreamRequest still hardcoded the old 3-token
subset. Anthropic now checks the complete set of beta tokens to
decide if a request qualifies as Claude Code. Wire them up:

  - /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
  - /v1/messages/count_tokens mimic path: same + BetaTokenCounting

Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
2026-04-24 23:16:32 +08:00
keh4l
b5467d610a fix(gateway): apply full Claude Code mimicry on /chat/completions and /responses
Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
which prepends the Claude Code banner but leaves the rest of the body
in its original non-Claude-Code shape. The codebase already admits this
is insufficient (see the comment on rewriteSystemForNonClaudeCode in
gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").

Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
were detected as third-party apps and bled plan quota with:

    Third-party apps now draw from your extra usage, not your plan limits.

Fix:
  - apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
    survives the OpenAI->Anthropic->Marshal round trip; without it the
    downstream rewrite has no user_id to work with.
  - service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
    variant of the /v1/messages mimicry pipeline
    (rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
    metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
  - service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
    for the same reason (no ParsedRequest at the call site).
  - ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
    prompt-prepend with the full mimicry pipeline.
  - applyClaudeCodeMimicHeaders: set x-client-request-id per-request
    (real Claude CLI always does); missing/duplicated values are one more
    third-party fingerprint signal.

No change to the native /v1/messages path: it already called the full
pipeline, we only lift those helpers into a reusable function.

Tests:
  - go build ./... passes
  - go test ./internal/service/... ./internal/pkg/apicompat/... passes
  - lsp_diagnostics clean on all touched files
  - pre-existing failures in internal/config are unrelated (env-sensitive
    tests that also fail on upstream main)
2026-04-24 23:16:32 +08:00
keh4l
57ff97960d chore(claude): bump mimicked CLI to 2.1.92 and extend anthropic-beta list
Align Claude Code mimicry constants with the latest real CLI traffic
(see Parrot's src/transform/cc_mimicry.py). Anthropic now uses the full
set of anthropic-beta tokens to decide whether a request counts as
"official Claude Code"; requests missing tokens that real CLI ships
today are demoted to third-party usage:

  Third-party apps now draw from your extra usage, not your plan limits.

Changes:
  - claude/constants.go: add new beta tokens (prompt-caching-scope,
    effort, redact-thinking, context-management, extended-cache-ttl) and
    expose FullClaudeCodeMimicryBetas() for the OAuth mimicry path.
  - claude/constants.go: bump default User-Agent to claude-cli/2.1.92.
  - identity_service.go: bump defaultFingerprint User-Agent accordingly.

No behavioral change for clients that already send a newer UA (fingerprint
merge still prefers the incoming value).
2026-04-24 23:16:32 +08:00
Wesley Liddick
5b5db88550 Merge pull request #1897 from VpSanta33/codex/invite-affiliate-rebate
feat: 新增邀请返利功能,并支持后台配置返利比例
2026-04-24 22:36:53 +08:00
VpSanta33
f03de00cb9 feat: add affiliate invite rebate flow and admin rebate-rate setting 2026-04-24 22:22:26 +08:00
Wesley Liddick
76aae5aa74 Merge pull request #1911 from gaoren002/fix/codex-responses-payload-normalization-mainbase
fix(openai): normalize codex responses payloads
2026-04-24 21:37:32 +08:00
gaoren002
27ee141c1e fix(openai): preserve mcp tool call ids 2026-04-24 13:24:21 +00:00
gaoren002
e65574dea9 fix(openai): normalize codex responses payloads 2026-04-24 12:03:19 +00:00
Wesley Liddick
1ce9dc03f9 Merge pull request #1895 from gaoren002/fix/codex-spark-limitations
fix(openai): handle codex spark model limitations
2026-04-24 19:57:42 +08:00
Wesley Liddick
15ce914a62 Merge pull request #1910 from slovx2/fix/codex-tool-call-ids
fix(openai): 修复 Codex 工具调用 call_id 处理
2026-04-24 19:56:03 +08:00
song
959af1c8f6 fix(openai): preserve codex tool call ids 2026-04-24 19:31:49 +08:00
gaoren002
c4d496da18 fix(openai): handle codex spark model limitations 2026-04-24 07:42:31 +00:00
KnowSky404
f3ea878ba2 chore: trigger PR checks 2026-04-24 11:32:41 +08:00
KnowSky404
d80469ea35 test: fix OpenAI account test helper calls after rebase 2026-04-24 11:32:41 +08:00
KnowSky404
5fc30ea964 test: cover openai admin test state transitions 2026-04-24 11:32:41 +08:00
KnowSky404
f68909a68b fix: reconcile openai admin test rate-limit state 2026-04-24 11:32:41 +08:00
github-actions[bot]
d162604f32 chore: sync VERSION to 0.1.117 [skip ci] 2026-04-24 01:40:02 +00:00
shaw
a4e329c18b fix: openai默认模型新增gpt5.5 2026-04-24 09:08:31 +08:00
shaw
ca204ddd2f fix(openai): preserve image outputs when text content serialization fails
In reconstructResponseOutputFromSSE, text content Marshal/Unmarshal
failure previously caused an early return that silently discarded
already-extracted image_generation_call outputs. Now serialization
errors are tolerated so image results still reach the client.
2026-04-24 08:58:51 +08:00
Wesley Liddick
ff08f9d798 Merge pull request #1853 from gaoren002/fix/codex-image-generation-bridge
fix(openai): 完善 Codex 在 Responses 链路下的图片生成兼容性
2026-04-24 08:55:23 +08:00
Wesley Liddick
ac11473833 Merge pull request #1850 from touwaeriol/feat/channel-insights
feat(monitor): channel monitor with available channels & feature flags
2026-04-24 08:31:21 +08:00
erio
09fd83ab9b fix(monitor): clean up unused updatedAt/updatedLabel after label removal 2026-04-24 00:14:30 +08:00
erio
6699d33760 fix(monitor): remove redundant "updated at" label from MonitorHero 2026-04-24 00:08:57 +08:00
erio
f7c8377abf fix(monitor): remove UNAVAILABLE status, keep only OPERATIONAL/DEGRADED 2026-04-24 00:03:22 +08:00
erio
0dcc0e0504 feat(monitor): proportion-based overall status + reusable auto-refresh
- Change overall status logic: >50% failed = UNAVAILABLE, any failed
  or degraded = DEGRADED, all ok = OPERATIONAL
- Extract useAutoRefresh composable with localStorage persistence
- Create AutoRefreshButton dropdown component (reusable)
- Integrate auto-refresh into channel status page via MonitorHero
2026-04-24 00:03:22 +08:00
gaoren002
5f41899705 fix: bridge codex image generation over responses 2026-04-23 15:13:57 +00:00
erio
5e060b2222 Merge remote-tracking branch 'upstream/main' into feat/channel-insights
# Conflicts:
#	backend/cmd/server/wire_gen.go
2026-04-23 22:30:45 +08:00
erio
6f04c25e3d test(api): add channel monitor fields to admin settings contract test 2026-04-23 22:15:03 +08:00
erio
375cce29c6 chore: remove accidentally committed fork utility script 2026-04-23 21:56:28 +08:00
erio
67518a59ac revert: remove fork-only changes from release sync
Revert payment/wechat, sora/claude-max cleanup, fork-only migrations,
and cosmetic changes that were brought in by the release sync commit.
Keep only channel-monitor related improvements:
- PublicSettingsInjectionPayload named struct with drift test
- ChannelMonitorRunner graceful shutdown in wire
- image_output_price in SupportedModelChip
- Simplified buildSelfNavItems in AppSidebar
- Gateway WARN logs for 503 branches
2026-04-23 21:40:58 +08:00
erio
a3ea8ecac5 fix(wire): add ChannelMonitorRunner.Stop() to cleanup steps in wire_gen.go 2026-04-23 21:06:51 +08:00
erio
497872693f chore: remove test files deleted in release
HelpTooltip.spec.ts and PaymentProviderDialog.spec.ts were removed
in release/custom-0.1.115; commit the deletion.
2026-04-23 21:04:54 +08:00
erio
748a84d871 sync: bring over remaining release/custom-0.1.115 changes
- Extract PublicSettingsInjectionPayload named struct with drift test
- Add channel_monitor_default_interval_seconds to SSR injection
- Add image_output_price to SupportedModelChip
- Simplify AppSidebar buildSelfNavItems (admins see available channels)
- Add gateway WARN logs for 503 no-available-accounts branches
- Wire ChannelMonitorRunner into provideCleanup for graceful shutdown
- Add migrations 130/131 (CC template userid fix + mimicry field cleanup)
- Clean up fork-only features (sora, claude max simulation, client affinity)
- Remove ~320 obsolete i18n keys
- Add codexUsage utility, WechatServiceButton, BulkEditAccountModal
- Tidy go.sum
2026-04-23 20:55:18 +08:00
erio
d5dac84e12 test(payment): cover ErrOrderNotFound sentinel contract
Service layer (payment_fulfillment_order_not_found_test.go):
- TestHandlePaymentNotification_UnknownOrder_ReturnsSentinel: in-memory
  sqlite ent client, query for a non-existent out_trade_no → errors.Is
  must recognise ErrOrderNotFound (handler relies on this to ack 200).
- TestHandlePaymentNotification_NonSuccessStatus_Skips: non-success
  notification short-circuits before DB lookup → nil error.
- TestErrOrderNotFound_DistinctFromOtherErrors: generic errors must not
  match the sentinel (prevents silently swallowing DB failures).

Handler layer (payment_webhook_handler_test.go):
- TestUnknownOrderWebhookAcksWithSuccess: locks the two ingredients the
  handleNotify ack path depends on — fmt.Errorf %w wrapping preserves
  errors.Is recognition, and writeSuccessResponse(stripe) returns an
  empty 200 body that Stripe treats as acknowledged.
2026-04-23 19:22:43 +08:00
erio
75e1b40fb4 fix(payment): ack unknown-order webhooks with 2xx to stop provider retries
Introduce a sentinel ErrOrderNotFound in the payment service layer so the
webhook handler can distinguish "the out_trade_no does not exist in our DB"
from other fulfillment failures, and downgrade the former to a WARN log +
success response.

Background
- Providers (Stripe, Alipay, Wxpay, EasyPay, ...) retry webhooks whenever
  we answer non-2xx. When a webhook endpoint is misconfigured (e.g. a
  foreign environment points at us) or our orders table has been wiped,
  we return 500 forever and the provider retries for days, spamming logs.
- The old code also collapsed "order not found" and "DB query failed" into
  the same branch — a DB blip would be reported as "order not found" and
  swallowed.

Service layer (payment_fulfillment.go)
- Add `var ErrOrderNotFound = errors.New("payment order not found")`.
- In HandlePaymentNotification, distinguish the two error paths:
  * dbent.IsNotFound(err) → wrap with ErrOrderNotFound so callers can
    errors.Is(...) it.
  * anything else → wrap the original err with %w so it still bubbles up
    as 500 and the provider retries (DB hiccup should be retried).

Handler layer (payment_webhook_handler.go)
- Before returning 500, check errors.Is(err, service.ErrOrderNotFound):
  emit a WARN (with provider / outTradeNo / tradeNo for discoverability),
  then call writeSuccessResponse so the provider sees its expected 2xx
  body (Stripe empty body / Wxpay JSON / others "success").
- Other errors retain the existing 500 behavior.

Monitoring note: because this path now swallows unknown-order webhooks
silently from the provider's perspective, the WARN log line is the only
signal. Alert on "unknown order, acking to stop retries" if you want
visibility into misrouted webhooks or accidental data loss.
2026-04-23 18:33:28 +08:00