← 英文架構版← English Architecture LLM 議會系列 · 2 / 2LLM Council Series · 2 / 2
專家模型數Expert Models
3
OpenAI · Google · Anthropic
主持人Moderator
1
Gemini 2.5 Flash
JSON 欄位數JSON Fields
14
結構化裁決輸出Structured verdict
Token 上限 / 專家Max Tok / Expert
4096
主持人 8192Moderator: 8192
速率限制Rate Limit
30
次 / 分鐘 / IPreq / min / IP
歷史 Token 節省History Savings
~70%
D2 決策帶來via Decision D2
§01 核心概念:為什麼是議會?Core Concept: Why a Council?

單一 LLM 的知識有邊界,推理有盲點。議會模式讓三個來自不同公司、以不同架構訓練的 LLM 同時針對同一問題獨立分析,由主持人綜合。這不只是「投票」——各專家被分配不同專業域,最終結論融合了互補的知識視角。 A single LLM has knowledge boundaries and reasoning blind spots. The Council pattern deploys three LLMs from different vendors, trained on different architectures, to independently analyse the same question simultaneously. A moderator then synthesises the results. This is not mere voting — each expert is assigned a different domain specialty, so the final answer fuses complementary knowledge perspectives.

互補專業域Complementary Domains

GPT-5-mini 專注程序法;Gemini 專注難民法;Claude 專注簽證子類別。三者不重疊,互為補充。GPT-5-mini focuses on procedural law; Gemini on refugee law; Claude on visa subclasses. The three domains are non-overlapping and complementary.

主動分歧偵測Active Disagreement Detection

主持人的 expert_consensus 欄位會明確標示 full / partial / disputed。分歧本身就是資訊,不應被掩蓋。The moderator's expert_consensus field explicitly flags full / partial / disputed. Disagreement itself is information that should not be hidden.

結構化輸出,非自由文字Structured Output, not Free Text

14 欄位 JSON 強制要求主持人在每個維度都給出明確答案:風險、優勢、建議行動、判例、緊急程度——全部結構化。The 14-field JSON forces the moderator to give an explicit answer on every dimension: risk factors, positive factors, recommended actions, precedents, urgency — all structured.

多輪對話記憶Multi-Turn Conversation Memory

D2 決策確保每輪歷史以主持人摘要壓縮注入,使複雜案件可以在多輪深入探討而不爆炸 Token。Decision D2 ensures each round's history is compressed as the moderator summary, enabling complex cases to be explored across multiple turns without exploding token counts.

flowchart LR
    Q["🧑 User Question"] --> GW["Cloudflare AI Gateway"]
    GW --> E1["GPT-5-mini\n程序法專家\nProc. Law"]
    GW --> E2["Gemini 3.1 Pro\n難民法專家\nRefugee Law"]
    GW --> E3["Claude Sonnet\n簽證專家\nVisa Expert"]
    E1 --> MOD["🎙 Moderator\nGemini 2.5 Flash"]
    E2 --> MOD
    E3 --> MOD
    MOD --> OUT["14-field JSON\nVerdict"]

    style Q fill:#a8552e,color:#fff,stroke:#7a352b
    style GW fill:#9c7b30,color:#fff,stroke:#7a5a20
    style E1 fill:#3a5a40,color:#fff,stroke:#2a4a30
    style E2 fill:#9c7b30,color:#fff,stroke:#7a5a20
    style E3 fill:#7a352b,color:#fff,stroke:#5a2520
    style MOD fill:#a8552e,color:#fff,stroke:#7a352b
    style OUT fill:#4a3a1a,color:#ede4d0,stroke:#9c7b30
      
§02 完整請求流程Full Request Flow

每一輪對話依序通過身份驗證與速率限制,並行呼叫三個模型,由主持人整合後存入資料庫並回傳。關鍵:ID 在 LLM 呼叫前預先分配,確保重試安全。 Every turn passes through auth and rate limiting, then invokes three models in parallel, the moderator synthesises, the result is stored and returned. Key: the turn ID is pre-assigned before LLM calls for retry safety.

用戶端Client
發送請求User Request
POST /api/v1/llm-council/sessions/:id/turns
Cloudflare Worker
身份驗證 + 速率限制Auth + Rate Limit
HMAC Session Token · RL_COUNCIL_TURN 30次/分/min
Cloudflare AI Gateway
統一計費路由Unified Billing Router
CF_AIG_TOKEN · cf-aig-authorization
專家一 · OpenAIExpert 1 · OpenAI
GPT-5-mini
max_completion_tokens 4096 · temp 1
專家二 · GoogleExpert 2 · Google
Gemini 3.1 Pro
max_tokens 4096 · temp 0.2
專家三 · AnthropicExpert 3 · Anthropic
Claude Sonnet 4.6
max_tokens 4096 · temp 0.2
主持人Moderator
Gemini 2.5 Flash
max_tokens 8192 · 整合輸出 14 欄 JSON14-field JSON output
Supabase PostgreSQL
council_turns 資料表table
INSERT … ON CONFLICT DO NOTHING
回傳用戶端Response
JSON 回應to Client
composed_answer + expert_opinions[]
§03 三位專家模型詳解Three Expert Models — Detail

每位專家接收相同問題與對話歷史,從各自專業角度獨立分析。三者以 Promise.all() 並行執行——整體延遲等於最慢那位,而非三者累加。各模型的系統提示詞(system prompt)針對其專業域精心設計。 Each expert receives the same question and conversation history, analysing independently from their domain specialty. All three run concurrently via Promise.all(). Total latency equals the slowest single model. Each model's system prompt is carefully crafted for its specific domain.

專家一 · OpenAIExpert 1 · OpenAI

GPT-5-mini

openai/gpt-5-mini
  • 程序正義與正當程序分析Procedural rights and due process
  • 《移民法 1958》條文解釋Statutory interpretation — Migration Act 1958
  • 管轄錯誤識別(Jurisdictional error)Jurisdictional error identification
  • 聯邦法院覆核途徑Federal Court review pathways
  • 具約束力先例引用與權重Binding precedent weight and citation
max_completion_tokens: 4096
temperature: 1 推理模型專用,非可調reasoning model, not tunable
專家二 · GoogleExpert 2 · Google

Gemini 3.1 Pro

google-ai-studio/gemini-3.1-pro-preview
  • 難民法與國際保護標準Refugee law and international protection
  • 國別信息(COI)與可信度評估Country of origin information (COI) and credibility
  • 《難民公約》五項理由(種族/宗教/PSG)Convention grounds analysis (race/religion/PSG)
  • 補充保護途徑(Complementary protection)Complementary protection pathways
  • 難民身份甄別程序公正性(RSD)Procedural fairness in RSD interviews
max_tokens: 4096
temperature: 0.2
專家三 · AnthropicExpert 3 · Anthropic

Claude Sonnet 4.6

anthropic/claude-sonnet-4-6
  • 簽證子類別資格與標準對應Visa subclass eligibility and criteria mapping
  • AAT / ART 裁判所覆核管轄Tribunal review — AAT/ART jurisdiction
  • 品格與健康要求分析Character and health requirement analysis
  • 實質性覆核(Merits review)成功因素Merits review success factors
  • 代理策略建議Representation strategy recommendations
max_tokens: 4096
temperature: 0.2
§04 主持人輸出 SchemaModerator Output Schema

Gemini 2.5 Flash 閱讀三位專家的完整回應與對話歷程後,輸出嚴格的 14 欄位 JSON。主持人負責解決分歧、標示不確定性,並引用最強論據。這份輸出直接以 jsonb 存入 council_turns 資料表。 Gemini 2.5 Flash reads all three expert responses and the conversation, then produces a strict 14-field JSON. The moderator resolves disagreements, flags uncertainty, and cites the strongest reasoning. This output is stored directly as jsonb in the council_turns table.

moderator_output — council_turns.moderator_output (jsonb) — 最大max 8192 tokens
"composed_answer"string對用戶問題的主要整合回答——這是用戶最終看到的內容Primary synthesised response to the user's question — what the user ultimately sees
"outcome_prediction"enum stringlikely_success | likely_failure | uncertain
"confidence_score"number 0–1主持人對整合回答的信心指數,反映專家共識程度Moderator's confidence in the composed answer, reflecting degree of expert consensus
"key_legal_issues"string[]本案涉及的核心法律問題清單Primary legal questions raised by this case
"risk_factors"string[]可能削弱申請人案件的因素Factors that could weaken the applicant's case
"positive_factors"string[]有利於申請人立場的因素Factors that strengthen the applicant's position
"recommended_actions"string[]申請人具體可行的下一步行動Concrete next steps for the applicant
"relevant_visa_subclasses"string[]適用的簽證子類別(如 866 保護簽證、785 臨時保護、790)Applicable visa subclasses (e.g. 866, 785 Temporary Protection, 790)
"case_precedents"string[]各專家分析中引用的相關判例Relevant case citations from expert analyses
"expert_consensus"enum stringfull(完全一致)| partial(部分一致)| disputed(有分歧)full | partial | disputed
"dissenting_views"string專家意見分歧摘要——若 expert_consensus 為 disputed 則必填Summary of expert disagreements — required when expert_consensus is disputed
"urgency_level"enum stringcritical | high | medium | low
"disclaimer"string標準免責聲明:本分析非法律建議,不構成代理關係Standard disclaimer: not legal advice, no attorney-client relationship
"follow_up_questions"string[]釐清案情所需的追問問題,引導下一輪對話Clarifying questions to gather more context, guiding the next conversation turn
§05 對話歷史注入 — 設計決策 D2Conversation History Injection — Decision D2

buildHistoryMessages(prevTurns) 將每輪主持人的 composed_answer 作為 role: "assistant" 注入,而非重複三份原始輸出。多輪對話 Token 成本壓低約 70%——這是整個系統最重要的工程決策之一。 buildHistoryMessages(prevTurns) injects each prior moderator composed_answer as a role: "assistant" turn instead of repeating all three raw expert outputs. This reduces multi-turn token costs by ~70% — one of the most important engineering decisions in the system.

每位專家看到的是交替出現的 user / assistant 對話,上下文完整,Token 消耗最小。 Each expert sees the conversation as interleaved user/assistant turns — preserving full context at minimal token cost.

第一輪 · role: userTurn 1 · role: user
我申請保護簽證的勝算如何?What are my chances for a Protection visa?
第一輪 · role: assistant(主持人 composed_answer)Turn 1 · role: assistant (moderator composed_answer)
根據您的 AATA 案件歷程,保護簽證前景取決於… 【三位專家整合結果,非任一原文】 Based on your AATA case history, protection visa prospects depend on… [synthesised from 3 experts, not any single raw output]
第二輪 · role: user(當前輪次)Turn 2 · role: user (current)
如果我同時主張補充保護途徑呢?What about the complementary protection pathway?
為何 D2 至關重要:Why D2 matters: 若將三位專家原始輸出全部注入,每輪歷史 Token 達 3 倍(每位專家各見一份完整歷史)。D2 以單一主持人摘要取代,效果等同,成本降至 1/3。在 10 輪對話中,節省幅度尤為顯著。 Without D2, each expert would see 3× the history tokens per turn (each expert sees its own copy of the full history). D2 replaces all expert outputs with the single moderator summary, achieving the same context at 1/3 the cost. In a 10-turn conversation, the savings are dramatic.
// workers/llm-council/runner.js function buildHistoryMessages(prevTurns) { const msgs = []; for (const turn of prevTurns) { msgs.push({ role: "user", content: turn.user_message }); if (turn.moderator_output?.composed_answer) { msgs.push({ role: "assistant", // D2: 僅注入主持人摘要,非三份原始輸出 // D2: moderator summary only, not raw outputs content: turn.moderator_output.composed_answer }); } } return msgs; } // 三位專家並行,延遲 = max(e1, e2, e3) // Parallel: latency = max(e1, e2, e3) const [e1, e2, e3] = await Promise.all([ runExpert(env, EXPERT_1_MODEL, EXPERT_1_SYSTEM, historyMsgs, userMsg), runExpert(env, EXPERT_2_MODEL, EXPERT_2_SYSTEM, historyMsgs, userMsg), runExpert(env, EXPERT_3_MODEL, EXPERT_3_SYSTEM, historyMsgs, userMsg), ]); // 主持人整合三方輸出 → 14 欄 JSON // Moderator synthesises → 14-field JSON const result = await runModerator( env, e1, e2, e3, historyMsgs, userMsg );
§06 真實應用場景Real-World Use Cases

議會模式特別適合需要多維度分析、且不同維度具有專業壁壘的任務。以下是 IMMI-Case- 之外,此模式的潛在擴展場景。 The Council pattern excels at tasks requiring multi-dimensional analysis where different dimensions have genuine knowledge barriers. Below are potential expansions beyond IMMI-Case-.

⚖️

移民法律諮詢Immigration Legal Advice

程序法 + 難民法 + 簽證資格三維並行,主持人給出綜合建議與緊急程度評估。(本系統目前用途)Procedural law + refugee law + visa eligibility in parallel, moderator gives comprehensive advice with urgency assessment. (Current system use)

GPT-5-mini · Gemini Pro · Claude Sonnet
🏥

醫療診斷輔助Medical Diagnosis Assistance

症狀分析 + 藥物交互作用 + 影像解讀三專家並行,主持人整合出治療建議與風險清單。Symptom analysis + drug interactions + imaging interpretation in parallel, moderator integrates treatment suggestions and risk list.

需領域微調模型Domain fine-tuned models
📊

投資盡職調查Investment Due Diligence

財務分析 + 行業競爭 + 法規合規三視角同時審查,主持人輸出投資評分與關鍵風險。Financial analysis + competitive landscape + regulatory compliance reviewed simultaneously, moderator outputs investment scoring and key risks.

可接入實時數據 APICan integrate real-time data APIs
🔐

安全漏洞評估Security Vulnerability Assessment

代碼靜態分析 + 運行時行為 + 社會工程向量三角分析,主持人評定 CVSS 分數與修復優先級。Static code analysis + runtime behavior + social engineering vectors triangulated, moderator assigns CVSS score and remediation priority.

可串接 SAST/DAST 工具Can integrate SAST/DAST tools
§07 模式取捨分析Pattern Trade-off Analysis

議會模式不是萬靈丹。以下對比單模型、議會模式(本系統)與多智能體管道的核心差異。 The Council pattern is not a silver bullet. Below compares single-model, council (this system), and multi-agent pipeline approaches.

維度Dimension 單一模型Single Model 議會模式(本系統)Council (this system) 多智能體管道Multi-Agent Pipeline
延遲Latency 最低Lowest 中(= 最慢專家)Medium (= slowest expert) 最高(序列)Highest (sequential)
成本 / 輪次Cost / Turn 最低Lowest 3× + 主持人moderator 依管道長度而定Depends on pipeline depth
答案品質Answer Quality 受限於單一模型知識邊界Limited to single model's knowledge boundary 多視角融合,更全面Multi-perspective fusion, more comprehensive 可深度迭代Can iterate deeply
分歧可見性Disagreement Visibility None 明確標示(expert_consensus)Explicitly flagged (expert_consensus) 依實作而定Implementation-dependent
可解釋性Explainability Medium 高——各專家意見可單獨查閱High — each expert's opinion queryable 中(多步驟難以追蹤)Medium (multi-step hard to trace)
Token 多輪效率Multi-Turn Token Efficiency 最高Best 高(D2 壓縮)High (D2 compression) 差(全鏈路歷史)Poor (full chain history)
廠商鎖定Vendor Lock-in High 低(三廠商分散)Low (three vendors diversified) 依設計而定Design-dependent
最適用場景:Best suited when: 問題具有明確的多維度結構、不同維度有真實的專業壁壘、答案正確性至關重要(高風險決策)、且使用者對等待 10–20 秒有心理預期。若問題是快速問答型,單模型更合適。 The problem has a clear multi-dimensional structure, different dimensions have genuine knowledge barriers, answer correctness is critical (high-stakes decisions), and the user expects to wait 10–20 seconds. For quick Q&A use cases, a single model is more appropriate.
§08 六項設計決策Six Design Decisions

每個決策都源自具體約束——延遲、Token 成本、計費複雜度或安全性。非拍腦袋,而是 trade-off 後的閉環結論。 Each decision arose from a concrete constraint: latency, token cost, billing complexity, or security. These are closed-loop conclusions after deliberate trade-off analysis.

D1 · 效能Performance
三位專家同時並行,不排隊Three experts run concurrently
所有專家呼叫包在單一 Promise.all()。整體延遲等於最慢單一模型,非三者累加。序列執行下,GPT-5-mini 長提示詞單次就可能耗費 8–12 秒,三個累加等於約 40 秒,不可接受。All three expert calls wrapped in a single Promise.all(). Total latency equals the slowest single model. Sequential would mean ~40s per turn — unacceptable for a conversational UI.
p95 輪次延遲 ~15s vs 序列 ~40sp95 turn latency ~15 s vs ~40 s sequential
D2 · 架構Architecture
只注入主持人摘要,不注入三份原始輸出Only the moderator's answer goes into history
每輪歷史只將主持人的 composed_answer 作為 role: "assistant" 注入,三份輸出壓縮為一份摘要,Token 消耗降至原本的 1/3。這也簡化了前端歷史渲染邏輯。Only the moderator's composed_answer is used as the role: "assistant" turn. Three outputs compressed to one summary — 1/3 the token cost. This also simplifies frontend history rendering.
多輪對話 Token 節省約 70%Token savings: ~70% on multi-turn conversations
D3 · 營運Operations
所有模型經 Cloudflare AI Gateway,單一 TokenAll models via Cloudflare AI Gateway
單一 CF_AIG_TOKEN 認證 OpenAI、Anthropic、Google AI Studio。帳單在同一 Dashboard,Worker Secrets 內不儲存任何 per-provider API Key,大幅降低 secret rotation 複雜度。A single CF_AIG_TOKEN authenticates to all three providers. Credits in one dashboard. No per-provider API keys in Worker secrets — dramatically reduces secret rotation complexity.
供應商前綴Provider prefix: openai/ · anthropic/ · google-ai-studio/
D4 · 相容性Compatibility
GPT-5 / o 系列模型使用不同 API 參數Special-cased params for reasoning models
isGpt5ReasoningModel() 檢查模型名稱。若符合,改用 max_completion_tokens(非 max_tokens)並強制 temperature: 1(非 0.2)。OpenAI 推理模型收到非 1 的溫度值直接回傳 HTTP 400,必須特殊處理。isGpt5ReasoningModel() checks the model name. If matched: uses max_completion_tokens (not max_tokens) and forces temperature: 1. OpenAI reasoning models reject non-1 temperatures with a 400 error — must be special-cased.
偵測條件:模型名含Detection: name contains "gpt-5", "o1", "o3", "o4"
D5 · 可靠性Reliability
ON CONFLICT DO NOTHING 確保重試安全ON CONFLICT DO NOTHING on turn inserts
addTurn() 在昂貴的 LLM 呼叫前先以 nanoid21 分配主鍵,再以 INSERT … ON CONFLICT DO NOTHING 寫入。Worker 超時重試時(Cloudflare 預設 30s CPU),重複主鍵不會產生重複記錄。addTurn() pre-assigns a nanoid21 primary key before expensive LLM calls, then writes with INSERT … ON CONFLICT DO NOTHING. On Worker timeout retries (Cloudflare's 30s CPU limit), duplicate keys create no duplicate rows.
ID 在 LLM 呼叫前預先分配,天然具備重試安全性ID pre-assigned before LLM calls — retry-safe by design
D6 · 安全Security
HMAC Token 取代 JWT 用於輪次驗證HMAC token, not JWT, for turn-level auth
會話 Token:nanoid(21) + "." + HMAC-SHA256(id, secret)。驗證僅需一次 HMAC 計算,無需查詢 Supabase。用戶 JWT(Telegram 登入)只在建立會話時需要,後續輪次改用更輕量的 Session Token,每次輪次節省一次資料庫 round-trip。Session tokens: nanoid(21) + "." + HMAC-SHA256(id, secret). Verification is one HMAC check, no DB lookup. The user's JWT (Telegram login) is only required at session creation; subsequent turns use the lighter session token, saving one DB round-trip per turn.
驗證成本:1 次 HMAC,無資料庫來回Verification: 1 HMAC · no round-trip to Supabase
§09 自行實作指引Implementation Guide

想在自己的專案中複製議會模式?以下是關鍵步驟,不依賴本系統的具體技術棧。 Want to replicate the Council pattern in your own project? Here are the key steps, independent of this system's specific tech stack.

定義專業域邊界Define domain boundaries

明確每個「專家」負責的知識域,確保三者不重疊但互補。為每個專家撰寫聚焦的 system prompt,列出 5–8 個專業問題維度。避免讓三個模型做同樣的事——那只是冗餘,不是議會。Clearly define the knowledge domain each "expert" covers, ensuring the three are non-overlapping but complementary. Write a focused system prompt for each expert listing 5–8 professional analysis dimensions. Avoid having three models do the same thing — that is redundancy, not a council.

設計主持人 prompt 與輸出 schemaDesign the moderator prompt and output schema

主持人是整個系統的品質瓶頸。為其設計嚴格的 JSON schema,每個欄位都有明確語義。使用支援 structured output 的模型(如 Gemini / GPT-4o 的 JSON mode)確保輸出格式穩定。The moderator is the quality bottleneck of the whole system. Design a strict JSON schema for it, with clear semantics for every field. Use a model supporting structured output (e.g. Gemini / GPT-4o JSON mode) to ensure stable output format.

實作 D2 歷史壓縮Implement D2 history compression

不要直接把三份專家輸出拼接進歷史。用主持人的 composed_answer 作為 assistant 輪次。這一步在多輪對話中可節省 70% Token,且不損失上下文質量。Do not naively concatenate three expert outputs into history. Use the moderator's composed_answer as the assistant turn. This step saves 70% of tokens in multi-turn conversations without sacrificing context quality.

處理推理模型的參數差異Handle reasoning model parameter differences

若你使用 GPT-5 / o-series,必須偵測並切換 max_completion_tokens 而非 max_tokens,且強制 temperature: 1。建議用模型名稱前綴偵測,而非硬編碼模型 ID。If using GPT-5 / o-series, detect and switch to max_completion_tokens not max_tokens, and force temperature: 1. Use model name prefix detection rather than hardcoding model IDs.

預分配 ID,實現冪等儲存Pre-assign IDs for idempotent storage

在呼叫任何 LLM 之前生成 turn ID,並用 INSERT … ON CONFLICT DO NOTHING(PostgreSQL)或等效機制寫入。這確保在網路重試或 Worker 逾時時不產生重複記錄——LLM 呼叫耗時長,重試場景真實存在。Generate the turn ID before any LLM call, and write with INSERT … ON CONFLICT DO NOTHING (PostgreSQL) or equivalent. This ensures network retries or Worker timeouts produce no duplicate records — LLM calls are slow and retry scenarios are real.

§10 API 端點參考API Endpoint Reference

六個 Worker 原生端點。會話繫結的路由需在標頭帶入 X-Session-Token,驗證為單次 HMAC 計算,無需查詢資料庫。 Six Worker-native endpoints. Session-bound routes require an X-Session-Token header — a single HMAC check, no DB lookup needed.

POST/api/v1/llm-council/sessions
建立新議會會話,回傳 session_id 與 HMAC Token。可傳入 case_id 預先載入案件上下文至系統提示詞。 Create a new Council session. Returns session_id + HMAC token. Accepts optional case_id to pre-load case context into the system prompt.
驗證Auth: JWT Bearer (Telegram)
POST/api/v1/llm-council/sessions/:id/turns
新增一輪提問,觸發完整三專家 + 主持人流程。透過 RL_COUNCIL_TURN 限流(每 IP 30 次/分)。 Add a user turn. Triggers the full 3-expert + moderator pipeline. Rate-limited via RL_COUNCIL_TURN (30 req/min per IP).
驗證Auth: X-Session-Token
GET/api/v1/llm-council/sessions/:id
取得完整會話及所有輪次,含每一輪三位專家意見與主持人輸出。 Retrieve a full session with all turns, including expert opinions and moderator output for each turn.
驗證Auth: X-Session-Token
GET/api/v1/llm-council/sessions
列出當前用戶所有會話,回傳 id、建立時間、輪次數量、最後訊息摘要。 List all sessions for the authenticated user. Returns session metadata: id, created_at, turn count, last message snippet.
驗證Auth: JWT Bearer
DELETE/api/v1/llm-council/sessions/:id
刪除指定會話,透過 Supabase RLS 策略串聯刪除對應 council_turns 記錄。 Delete a session. Cascades to council_turns via Supabase RLS cascade policy.
驗證Auth: X-Session-Token
POST/api/v1/llm-council/run舊版Legacy
無狀態單輪分析,不儲存會話。保留以維持舊版前端相容性,新程式碼應改用 sessions API。 Stateless single-turn analysis. No session storage. Kept for backward compatibility — new code should use the sessions API.
驗證Auth: JWT Bearer