架構圖 · IMMI-Case- Architecture Diagram · IMMI-Case-

GPT + Gemini + Claude
議會架構Council

三個專家 LLM 並行分析移民案件,主持人將其整合為含 14 個欄位的結構化裁決 JSON。 Three expert LLMs analyse immigration cases in parallel. A moderator synthesises their findings into a 14-field structured verdict.

3專家模型Expert Models
1主持人Moderator
14JSON 欄位JSON Fields
4096最大 Token / 專家Max Tok / Expert
30次/分鐘限制Req/min Limit
§01

請求流程Request Flow

每一輪對話依序通過身份驗證與速率限制,並行呼叫三個模型,由主持人整合後存入資料庫並回傳。 Every turn passes through auth, rate limiting, and parallel model invocations before the synthesised response is stored and returned.

flowchart TD
    A[("👤 User Request\nPOST /sessions/:id/turns")] --> B
    B["🛡 Cloudflare Worker\nAuth + Rate Limit\nHMAC · 30 req/min"] --> C
    C["⚙ Cloudflare AI Gateway\nUnified Billing\nCF_AIG_TOKEN"] --> D & E & F
    D["GPT-5-mini\nOpenAI\nmax_tokens 4096 · temp 1"] --> G
    E["Gemini 3.1 Pro\nGoogle\nmax_tokens 4096 · temp 0.2"] --> G
    F["Claude Sonnet 4.6\nAnthropic\nmax_tokens 4096 · temp 0.2"] --> G
    G["🎙 Moderator\nGemini 2.5 Flash\n14-field JSON · 8192 tokens"] --> H
    H[("🗄 Supabase PostgreSQL\ncouncil_turns\nON CONFLICT DO NOTHING")] --> I
    I[("📤 JSON Response\ncomposed_answer + expert_opinions[]")]

    style A fill:#a8552e,color:#fff,stroke:#7a352b
    style B fill:#3a5a40,color:#fff,stroke:#2a4a30
    style C fill:#9c7b30,color:#fff,stroke:#7a5a20
    style D fill:#3a5a40,color:#fff,stroke:#2a4a30
    style E fill:#9c7b30,color:#fff,stroke:#7a5a20
    style F fill:#7a352b,color:#fff,stroke:#5a2520
    style G fill:#a8552e,color:#fff,stroke:#7a352b
    style H fill:#4a3a1a,color:#ede4d0,stroke:#9c7b30
    style I fill:#a8552e,color:#fff,stroke:#7a352b
    
用戶端Client
發送請求User Request
POST /api/v1/llm-council/sessions/:id/turns
Cloudflare Worker
身份驗證 + 速率限制Auth + Rate Limit
HMAC token · RL_COUNCIL_TURN 30/min
Cloudflare AI Gateway
統一計費路由Unified Billing
CF_AIG_TOKEN · cf-aig-authorization
專家一 · OpenAIExpert 1 · OpenAI
GPT-5-mini
max_completion_tokens · temp 1
專家二 · GoogleExpert 2 · Google
Gemini 3.1 Pro
max_tokens 4096 · temp 0.2
專家三 · AnthropicExpert 3 · Anthropic
Claude Sonnet 4.6
max_tokens 4096 · temp 0.2
主持人Moderator
Gemini 2.5 Flash
max_tokens 8192 · 14-field JSON
Supabase PostgreSQL
council_turns
INSERT ... ON CONFLICT DO NOTHING
回傳用戶端Response
JSON 回應to Client
composed_answer + expert_opinions[]
§02

三位專家模型Three Expert Models

每位專家接收相同問題與對話歷史,從各自專業角度獨立分析。三者以 Promise.all() 並行執行——整體延遲等於最慢那位,而非三者累加。 Each expert receives the question plus conversation history and independently analyses the case. All three run concurrently via Promise.all() — total latency equals the slowest single model.

專家一 · OpenAIExpert 1 · OpenAI

GPT-5-mini

openai/gpt-5-mini
  • 程序正義與正當程序分析Procedural rights and due process
  • 《移民法 1958》條文解釋Statutory interpretation — Migration Act 1958
  • 管轄錯誤識別Jurisdictional error identification
  • 具約束力先例引用與權重Binding precedent weight and citation
  • 聯邦法院覆核途徑Federal Court review pathways
max_completion_tokens: 4096
temperature: 1 推理模型專用reasoning model
專家二 · GoogleExpert 2 · Google

Gemini 3.1 Pro

google-ai-studio/gemini-3.1-pro-preview
  • 難民法與國際保護標準Refugee law and international protection
  • 國別信息與可信度評估Country information and credibility
  • 《難民公約》五項理由分析Convention ground analysis (race/religion/PSG)
  • 補充保護途徑Complementary protection pathways
  • 難民身份甄別程序公正性Procedural fairness in RSD interviews
max_tokens: 4096
temperature: 0.2
專家三 · AnthropicExpert 3 · Anthropic

Claude Sonnet 4.6

anthropic/claude-sonnet-4-6
  • 簽證子類別資格與標準對應Visa subclass eligibility and criteria mapping
  • AAT / ART 裁判所覆核管轄Tribunal review grounds and AAT/ART jurisdiction
  • 品格與健康要求分析Character and health requirement analysis
  • 實質性覆核成功因素Merits review success factors
  • 代理策略建議Representation strategy recommendations
max_tokens: 4096
temperature: 0.2
§03

主持人輸出 SchemaModerator Output Schema

Gemini 2.5 Flash 閱讀三位專家的完整回應與對話歷程後,輸出嚴格的 14 欄位 JSON。主持人負責解決分歧、標示不確定性,並引用最強論據。 Gemini 2.5 Flash reads all three expert responses and the conversation, then produces a strict 14-field JSON. The moderator resolves disagreements, flags uncertainty, and cites the strongest reasoning.

moderator_output — council_turns.moderator_output (jsonb) — 最大max 8192 tokens
"composed_answer"string對用戶問題的主要整合回答Primary synthesised response to the user's question
"outcome_prediction"enum stringlikely_success | likely_failure | uncertain
"confidence_score"number 0–1主持人對整合回答的信心指數Moderator confidence in the composed answer
"key_legal_issues"string[]本案涉及的核心法律問題清單Primary legal questions raised by this case
"risk_factors"string[]可能削弱申請人案件的因素Factors that could weaken the applicant's case
"positive_factors"string[]有利於申請人立場的因素Factors that strengthen the applicant's position
"recommended_actions"string[]申請人具體可行的下一步行動Concrete next steps for the applicant
"relevant_visa_subclasses"string[]適用的簽證子類別(如 866、785、790)Applicable visa subclasses (e.g. 866, 785, 790)
"case_precedents"string[]各專家分析中引用的相關判例Relevant case citations from expert analyses
"expert_consensus"enum stringfull | partial | disputed
"dissenting_views"string專家意見分歧摘要(若有)Summary of expert disagreements, if any
"urgency_level"enum stringcritical | high | medium | low
"disclaimer"string標準免責聲明:本分析非法律建議Standard "not legal advice" disclaimer
"follow_up_questions"string[]釐清案情所需的追問問題Clarifying questions to gather more context
§04

對話歷史注入 — 設計決策 D2Conversation History — Decision D2

buildHistoryMessages(prevTurns) 將每輪主持人的 composed_answer 作為 role: "assistant" 注入,而非重複三份原始輸出。多輪對話 Token 成本壓低約 70%。 buildHistoryMessages(prevTurns) injects each prior moderator composed_answer as a role: "assistant" turn. This collapses 3 expert outputs into 1 coherent summary per turn, saving ~70% tokens on multi-turn conversations.

每位專家看到的是交替出現的 user / assistant 對話,上下文完整,Token 消耗最小。 Each expert sees the conversation as interleaved user/assistant turns — preserving context at minimal token cost.

第一輪 · role: userTurn 1 · role: user
我申請保護簽證的勝算如何?What are my chances for a Protection visa?
第一輪 · role: assistant(主持人 composed_answer)Turn 1 · role: assistant (moderator composed_answer)
根據您的 AATA 案件歷程,保護簽證前景取決於… 【三位專家整合結果】 Based on your AATA case history, protection visa prospects depend on… [synthesised from 3 experts]
第二輪 · role: user(當前輪次)Turn 2 · role: user (current)
如果我同時主張補充保護途徑呢?What about the complementary protection pathway?
為何 D2 至關重要:Why D2 matters: 若將三位專家原始輸出全部注入,每輪歷史 Token 達 3 倍。D2 以單一主持人摘要取代,效果等同,成本降至 1/3。 Without this, each expert would see 3× the history tokens per turn. D2 substitutes all expert outputs with the single moderator summary.
// workers/llm-council/runner.js function buildHistoryMessages(prevTurns) { const msgs = []; for (const turn of prevTurns) { msgs.push({ role: "user", content: turn.user_message }); if (turn.moderator_output?.composed_answer) { msgs.push({ role: "assistant", // Only moderator summary, not 3 raw outputs content: turn.moderator_output.composed_answer }); } } return msgs; } // All three experts run concurrently const historyMsgs = buildHistoryMessages(prevTurns); const [e1, e2, e3] = await Promise.all([ runExpert(env, EXPERT_1_MODEL, EXPERT_1_SYSTEM, historyMsgs, userMsg), runExpert(env, EXPERT_2_MODEL, EXPERT_2_SYSTEM, historyMsgs, userMsg), runExpert(env, EXPERT_3_MODEL, EXPERT_3_SYSTEM, historyMsgs, userMsg), ]); // Then moderator synthesises e1, e2, e3 const modResult = await runModerator( env, e1, e2, e3, historyMsgs, userMsg );
§05

API 端點API Endpoints

六個 Worker 原生端點。會話繫結的路由需在標頭帶入 X-Session-Token,驗證為單次 HMAC 計算,無需查詢資料庫。 Six Worker-native endpoints. Session-bound routes require an X-Session-Token header — a single HMAC check, no DB lookup needed.

POST/api/v1/llm-council/sessions
建立新議會會話,回傳 session_id 與 HMAC Token。可傳入 case_id 預先載入案件上下文。 Create a new Council session. Returns session_id + HMAC token. Accepts optional case_id to pre-load case context.
驗證Auth: JWT Bearer (Telegram login)
POST/api/v1/llm-council/sessions/:id/turns
新增一輪提問,觸發完整三專家 + 主持人流程。透過 RL_COUNCIL_TURN 限流(每 IP 30 次/分)。 Add a user turn. Triggers the full 3-expert + moderator pipeline. Rate-limited via RL_COUNCIL_TURN (30 req/min per IP).
驗證Auth: X-Session-Token
GET/api/v1/llm-council/sessions/:id
取得完整會話及所有輪次,含每一輪三位專家意見與主持人輸出。 Retrieve a full session with all turns, including expert opinions and moderator output for each turn.
驗證Auth: X-Session-Token
GET/api/v1/llm-council/sessions
列出當前用戶所有會話,回傳 id、建立時間、輪次數量、最後訊息摘要。 List all sessions for the authenticated user. Returns session metadata: id, created_at, turn count, last message snippet.
驗證Auth: JWT Bearer
DELETE/api/v1/llm-council/sessions/:id
刪除指定會話,透過 Supabase RLS 策略串聯刪除對應 council_turns 記錄。 Delete a session. Cascades to council_turns via Supabase RLS cascade policy.
驗證Auth: X-Session-Token
POST/api/v1/llm-council/run舊版Legacy
無狀態單輪分析,不儲存會話。保留以維持舊版前端相容性,新程式碼應改用 sessions API。 Stateless single-turn analysis. No session storage. Kept for backward compatibility — new code should use the sessions API.
驗證Auth: JWT Bearer
§06

六項設計決策Design Decisions

每個決策都源自具體約束——延遲、Token 成本、計費複雜度或安全性。非拍腦袋,而是 trade-off 後的閉環結論。 Six architectural choices that shaped the Council. Each arose from a concrete constraint: latency, token cost, billing complexity, or security.

D1 · 效能Parallelism
三位專家同時並行,不排隊Three experts run concurrently
所有專家呼叫包在單一 Promise.all()。整體延遲等於最慢單一模型,非三者累加。序列執行下,GPT-5-mini 長提示詞單次就可能耗費 8–12 秒。 All three expert calls are wrapped in a single Promise.all(). Total latency equals the slowest single model, not their sum. Without this a turn would take ~3× longer.
p95 輪次延遲 ~15s vs 序列 ~40sOutcome: p95 turn latency ~15 s vs ~40 s sequential
D2 · 架構History Injection
只注入主持人摘要,不注入三份原始輸出Only the moderator's answer goes into history
每輪歷史只將主持人的 composed_answer 作為 role: "assistant" 注入,三份輸出壓縮為一份摘要。 Instead of feeding all three expert outputs into each expert's next-turn context, only the moderator's composed_answer is used as the role: "assistant" turn.
多輪對話 Token 節省約 70%Token savings: ~70% on multi-turn conversations
D3 · 營運Unified Billing
所有模型經 Cloudflare AI Gateway,單一 TokenAll models via Cloudflare AI Gateway
單一 CF_AIG_TOKEN 認證 OpenAI、Anthropic、Google AI Studio。帳單在同一 Dashboard,Worker Secrets 內不儲存任何 per-provider API Key。 A single CF_AIG_TOKEN authenticates to all three providers via the compat endpoint. Credits tracked in one dashboard. No per-provider API keys stored as Worker secrets.
供應商前綴Provider prefix: openai/ · anthropic/ · google-ai-studio/
D4 · 相容性GPT-5 Reasoning Model
GPT-5 / o 系列模型使用不同 API 參數Special-cased params for reasoning models
isGpt5ReasoningModel() 檢查模型名稱。若符合,改用 max_completion_tokens 並強制 temperature: 1。OpenAI 推理模型收到非 1 的溫度值直接回傳 HTTP 400。 isGpt5ReasoningModel() checks the model name. If true: uses max_completion_tokens and temperature: 1. OpenAI's o-series and gpt-5 reject non-1 temperature with a 400 error.
偵測條件:模型名含Detection: model name contains "gpt-5", "o1", "o3", "o4"
D5 · 可靠性Storage Idempotency
ON CONFLICT DO NOTHING 確保重試安全ON CONFLICT DO NOTHING on turn inserts
addTurn() 在昂貴的 LLM 呼叫前先以 nanoid21 分配主鍵,再以 INSERT … ON CONFLICT DO NOTHING 寫入。Worker 超時重試時,重複主鍵不會產生重複記錄。 addTurn() uses INSERT ... ON CONFLICT DO NOTHING on the council_turns primary key. The client-visible turn ID (nanoid21) is assigned before the expensive LLM calls, so Worker retries cannot create duplicate rows.
ID 在 LLM 呼叫前預先分配,天然具備重試安全性ID pre-assigned before LLM calls — retry-safe by design
D6 · 安全Session Authentication
HMAC Token 取代 JWT 用於輪次驗證HMAC token, not JWT, for turn-level auth
會話 Token:nanoid(21) + "." + HMAC-SHA256(id, secret)。驗證僅需一次 HMAC 計算,無需查詢 Supabase。用戶 JWT 只在建立會話時需要,後續輪次改用更輕量的 Session Token。 Session tokens are nanoid(21) + "." + HMAC-SHA256(id, secret). Verification is one HMAC check — no DB lookup. The user's JWT is only required to create a session; subsequent turns use the cheaper session token.
驗證成本:1 次 HMAC,無資料庫來回Verification: 1 HMAC · no round-trip to Supabase
LLM 議會系列 · 1 / 2LLM Council Series · 1 / 2 中文精讀版 →Chinese Deep-Dive →