單一 LLM 的知識有邊界,推理有盲點。議會模式讓三個來自不同公司、以不同架構訓練的 LLM 同時針對同一問題獨立分析,由主持人綜合。這不只是「投票」——各專家被分配不同專業域,最終結論融合了互補的知識視角。 A single LLM has knowledge boundaries and reasoning blind spots. The Council pattern deploys three LLMs from different vendors, trained on different architectures, to independently analyse the same question simultaneously. A moderator then synthesises the results. This is not mere voting — each expert is assigned a different domain specialty, so the final answer fuses complementary knowledge perspectives.
GPT-5-mini 專注程序法;Gemini 專注難民法;Claude 專注簽證子類別。三者不重疊,互為補充。GPT-5-mini focuses on procedural law; Gemini on refugee law; Claude on visa subclasses. The three domains are non-overlapping and complementary.
主持人的 expert_consensus 欄位會明確標示 full / partial / disputed。分歧本身就是資訊,不應被掩蓋。The moderator's expert_consensus field explicitly flags full / partial / disputed. Disagreement itself is information that should not be hidden.
14 欄位 JSON 強制要求主持人在每個維度都給出明確答案:風險、優勢、建議行動、判例、緊急程度——全部結構化。The 14-field JSON forces the moderator to give an explicit answer on every dimension: risk factors, positive factors, recommended actions, precedents, urgency — all structured.
D2 決策確保每輪歷史以主持人摘要壓縮注入,使複雜案件可以在多輪深入探討而不爆炸 Token。Decision D2 ensures each round's history is compressed as the moderator summary, enabling complex cases to be explored across multiple turns without exploding token counts.
flowchart LR
Q["🧑 User Question"] --> GW["Cloudflare AI Gateway"]
GW --> E1["GPT-5-mini\n程序法專家\nProc. Law"]
GW --> E2["Gemini 3.1 Pro\n難民法專家\nRefugee Law"]
GW --> E3["Claude Sonnet\n簽證專家\nVisa Expert"]
E1 --> MOD["🎙 Moderator\nGemini 2.5 Flash"]
E2 --> MOD
E3 --> MOD
MOD --> OUT["14-field JSON\nVerdict"]
style Q fill:#a8552e,color:#fff,stroke:#7a352b
style GW fill:#9c7b30,color:#fff,stroke:#7a5a20
style E1 fill:#3a5a40,color:#fff,stroke:#2a4a30
style E2 fill:#9c7b30,color:#fff,stroke:#7a5a20
style E3 fill:#7a352b,color:#fff,stroke:#5a2520
style MOD fill:#a8552e,color:#fff,stroke:#7a352b
style OUT fill:#4a3a1a,color:#ede4d0,stroke:#9c7b30
每一輪對話依序通過身份驗證與速率限制,並行呼叫三個模型,由主持人整合後存入資料庫並回傳。關鍵:ID 在 LLM 呼叫前預先分配,確保重試安全。 Every turn passes through auth and rate limiting, then invokes three models in parallel, the moderator synthesises, the result is stored and returned. Key: the turn ID is pre-assigned before LLM calls for retry safety.
每位專家接收相同問題與對話歷史,從各自專業角度獨立分析。三者以 Promise.all() 並行執行——整體延遲等於最慢那位,而非三者累加。各模型的系統提示詞(system prompt)針對其專業域精心設計。
Each expert receives the same question and conversation history, analysing independently from their domain specialty. All three run concurrently via Promise.all(). Total latency equals the slowest single model. Each model's system prompt is carefully crafted for its specific domain.
Gemini 2.5 Flash 閱讀三位專家的完整回應與對話歷程後,輸出嚴格的 14 欄位 JSON。主持人負責解決分歧、標示不確定性,並引用最強論據。這份輸出直接以 jsonb 存入 council_turns 資料表。
Gemini 2.5 Flash reads all three expert responses and the conversation, then produces a strict 14-field JSON. The moderator resolves disagreements, flags uncertainty, and cites the strongest reasoning. This output is stored directly as jsonb in the council_turns table.
buildHistoryMessages(prevTurns) 將每輪主持人的 composed_answer 作為 role: "assistant" 注入,而非重複三份原始輸出。多輪對話 Token 成本壓低約 70%——這是整個系統最重要的工程決策之一。
buildHistoryMessages(prevTurns) injects each prior moderator composed_answer as a role: "assistant" turn instead of repeating all three raw expert outputs. This reduces multi-turn token costs by ~70% — one of the most important engineering decisions in the system.
每位專家看到的是交替出現的 user / assistant 對話,上下文完整,Token 消耗最小。 Each expert sees the conversation as interleaved user/assistant turns — preserving full context at minimal token cost.
議會模式特別適合需要多維度分析、且不同維度具有專業壁壘的任務。以下是 IMMI-Case- 之外,此模式的潛在擴展場景。 The Council pattern excels at tasks requiring multi-dimensional analysis where different dimensions have genuine knowledge barriers. Below are potential expansions beyond IMMI-Case-.
程序法 + 難民法 + 簽證資格三維並行,主持人給出綜合建議與緊急程度評估。(本系統目前用途)Procedural law + refugee law + visa eligibility in parallel, moderator gives comprehensive advice with urgency assessment. (Current system use)
症狀分析 + 藥物交互作用 + 影像解讀三專家並行,主持人整合出治療建議與風險清單。Symptom analysis + drug interactions + imaging interpretation in parallel, moderator integrates treatment suggestions and risk list.
財務分析 + 行業競爭 + 法規合規三視角同時審查,主持人輸出投資評分與關鍵風險。Financial analysis + competitive landscape + regulatory compliance reviewed simultaneously, moderator outputs investment scoring and key risks.
代碼靜態分析 + 運行時行為 + 社會工程向量三角分析,主持人評定 CVSS 分數與修復優先級。Static code analysis + runtime behavior + social engineering vectors triangulated, moderator assigns CVSS score and remediation priority.
議會模式不是萬靈丹。以下對比單模型、議會模式(本系統)與多智能體管道的核心差異。 The Council pattern is not a silver bullet. Below compares single-model, council (this system), and multi-agent pipeline approaches.
| 維度Dimension | 單一模型Single Model | 議會模式(本系統)Council (this system) | 多智能體管道Multi-Agent Pipeline |
|---|---|---|---|
| 延遲Latency | 最低Lowest | 中(= 最慢專家)Medium (= slowest expert) | 最高(序列)Highest (sequential) |
| 成本 / 輪次Cost / Turn | 最低Lowest | 3× + 主持人moderator | 依管道長度而定Depends on pipeline depth |
| 答案品質Answer Quality | 受限於單一模型知識邊界Limited to single model's knowledge boundary | 多視角融合,更全面Multi-perspective fusion, more comprehensive | 可深度迭代Can iterate deeply |
| 分歧可見性Disagreement Visibility | 無None | 明確標示(expert_consensus)Explicitly flagged (expert_consensus) | 依實作而定Implementation-dependent |
| 可解釋性Explainability | 中Medium | 高——各專家意見可單獨查閱High — each expert's opinion queryable | 中(多步驟難以追蹤)Medium (multi-step hard to trace) |
| Token 多輪效率Multi-Turn Token Efficiency | 最高Best | 高(D2 壓縮)High (D2 compression) | 差(全鏈路歷史)Poor (full chain history) |
| 廠商鎖定Vendor Lock-in | 高High | 低(三廠商分散)Low (three vendors diversified) | 依設計而定Design-dependent |
每個決策都源自具體約束——延遲、Token 成本、計費複雜度或安全性。非拍腦袋,而是 trade-off 後的閉環結論。 Each decision arose from a concrete constraint: latency, token cost, billing complexity, or security. These are closed-loop conclusions after deliberate trade-off analysis.
Promise.all()。整體延遲等於最慢單一模型,非三者累加。序列執行下,GPT-5-mini 長提示詞單次就可能耗費 8–12 秒,三個累加等於約 40 秒,不可接受。All three expert calls wrapped in a single Promise.all(). Total latency equals the slowest single model. Sequential would mean ~40s per turn — unacceptable for a conversational UI.composed_answer 作為 role: "assistant" 注入,三份輸出壓縮為一份摘要,Token 消耗降至原本的 1/3。這也簡化了前端歷史渲染邏輯。Only the moderator's composed_answer is used as the role: "assistant" turn. Three outputs compressed to one summary — 1/3 the token cost. This also simplifies frontend history rendering.CF_AIG_TOKEN 認證 OpenAI、Anthropic、Google AI Studio。帳單在同一 Dashboard,Worker Secrets 內不儲存任何 per-provider API Key,大幅降低 secret rotation 複雜度。A single CF_AIG_TOKEN authenticates to all three providers. Credits in one dashboard. No per-provider API keys in Worker secrets — dramatically reduces secret rotation complexity.isGpt5ReasoningModel() 檢查模型名稱。若符合,改用 max_completion_tokens(非 max_tokens)並強制 temperature: 1(非 0.2)。OpenAI 推理模型收到非 1 的溫度值直接回傳 HTTP 400,必須特殊處理。isGpt5ReasoningModel() checks the model name. If matched: uses max_completion_tokens (not max_tokens) and forces temperature: 1. OpenAI reasoning models reject non-1 temperatures with a 400 error — must be special-cased.addTurn() 在昂貴的 LLM 呼叫前先以 nanoid21 分配主鍵,再以 INSERT … ON CONFLICT DO NOTHING 寫入。Worker 超時重試時(Cloudflare 預設 30s CPU),重複主鍵不會產生重複記錄。addTurn() pre-assigns a nanoid21 primary key before expensive LLM calls, then writes with INSERT … ON CONFLICT DO NOTHING. On Worker timeout retries (Cloudflare's 30s CPU limit), duplicate keys create no duplicate rows.nanoid(21) + "." + HMAC-SHA256(id, secret)。驗證僅需一次 HMAC 計算,無需查詢 Supabase。用戶 JWT(Telegram 登入)只在建立會話時需要,後續輪次改用更輕量的 Session Token,每次輪次節省一次資料庫 round-trip。Session tokens: nanoid(21) + "." + HMAC-SHA256(id, secret). Verification is one HMAC check, no DB lookup. The user's JWT (Telegram login) is only required at session creation; subsequent turns use the lighter session token, saving one DB round-trip per turn.想在自己的專案中複製議會模式?以下是關鍵步驟,不依賴本系統的具體技術棧。 Want to replicate the Council pattern in your own project? Here are the key steps, independent of this system's specific tech stack.
明確每個「專家」負責的知識域,確保三者不重疊但互補。為每個專家撰寫聚焦的 system prompt,列出 5–8 個專業問題維度。避免讓三個模型做同樣的事——那只是冗餘,不是議會。Clearly define the knowledge domain each "expert" covers, ensuring the three are non-overlapping but complementary. Write a focused system prompt for each expert listing 5–8 professional analysis dimensions. Avoid having three models do the same thing — that is redundancy, not a council.
主持人是整個系統的品質瓶頸。為其設計嚴格的 JSON schema,每個欄位都有明確語義。使用支援 structured output 的模型(如 Gemini / GPT-4o 的 JSON mode)確保輸出格式穩定。The moderator is the quality bottleneck of the whole system. Design a strict JSON schema for it, with clear semantics for every field. Use a model supporting structured output (e.g. Gemini / GPT-4o JSON mode) to ensure stable output format.
不要直接把三份專家輸出拼接進歷史。用主持人的 composed_answer 作為 assistant 輪次。這一步在多輪對話中可節省 70% Token,且不損失上下文質量。Do not naively concatenate three expert outputs into history. Use the moderator's composed_answer as the assistant turn. This step saves 70% of tokens in multi-turn conversations without sacrificing context quality.
若你使用 GPT-5 / o-series,必須偵測並切換 max_completion_tokens 而非 max_tokens,且強制 temperature: 1。建議用模型名稱前綴偵測,而非硬編碼模型 ID。If using GPT-5 / o-series, detect and switch to max_completion_tokens not max_tokens, and force temperature: 1. Use model name prefix detection rather than hardcoding model IDs.
在呼叫任何 LLM 之前生成 turn ID,並用 INSERT … ON CONFLICT DO NOTHING(PostgreSQL)或等效機制寫入。這確保在網路重試或 Worker 逾時時不產生重複記錄——LLM 呼叫耗時長,重試場景真實存在。Generate the turn ID before any LLM call, and write with INSERT … ON CONFLICT DO NOTHING (PostgreSQL) or equivalent. This ensures network retries or Worker timeouts produce no duplicate records — LLM calls are slow and retry scenarios are real.
六個 Worker 原生端點。會話繫結的路由需在標頭帶入 X-Session-Token,驗證為單次 HMAC 計算,無需查詢資料庫。
Six Worker-native endpoints. Session-bound routes require an X-Session-Token header — a single HMAC check, no DB lookup needed.
session_id 與 HMAC Token。可傳入 case_id 預先載入案件上下文至系統提示詞。
Create a new Council session. Returns session_id + HMAC token. Accepts optional case_id to pre-load case context into the system prompt.
RL_COUNCIL_TURN 限流(每 IP 30 次/分)。
Add a user turn. Triggers the full 3-expert + moderator pipeline. Rate-limited via RL_COUNCIL_TURN (30 req/min per IP).
council_turns 記錄。
Delete a session. Cascades to council_turns via Supabase RLS cascade policy.