Agent Skill · 作者 shadcnby shadcn · 2,053 ★
improve.
一個 agent skill,審查任何程式碼庫,並為其他 agent 寫下可被執行的實作規格(plan)。
An agent skill that audits any codebase and writes implementation plans for other agents to execute.
把最強的模型用在「智慧會複利累積」的環節——理解程式碼、判斷什麼值得做、寫出規格——再把執行交給更便宜的模型。這個 skill 自己從不動手實作。規格本身,就是產出(The plan is the product)。
Use your most capable model for the part where intelligence compounds — understanding the codebase, judging what's worth doing, writing the spec — and hand execution to cheaper models. The skill never implements anything itself. The plan is the product.
適用於任何支援 Agent Skills 格式的 agent。規格是純 markdown——任何 agent、任何人都能接手。
Works in any agent that supports the Agent Skills format. Plans are plain markdown — any agent, or any human, can pick them up.
把智慧花在會複利的地方;把幾分錢花在不會的地方。
Spend intelligence where it compounds; spend cents where it doesn't.
這個賭注,是兩種模型的分工。理解程式碼、判斷什麼值得做,是前沿模型值回成本之處;把精確規格變成 diff,則是機械性工作。improve 讓 advisor 與 executor 嚴格分離——而且兩者都不碰你的 working tree。
The bet is a division of labour between two models. Understanding a codebase and judging what's worth doing is where a frontier model earns its cost; turning a precise spec into a diff is mechanical. improve keeps the advisor and the executor strictly separate — and never lets either touch your working tree.
- 繪製程式碼庫——技術棧、慣例、精確的 build / test / lint 指令Maps the repo — stack, conventions, the exact build / test / lint commands
- 以證據判斷什麼真正值得做Judges what's actually worth doing, with evidence
- 每個 finding 寫成一份自足規格Writes one self-contained spec per finding
- 在給你看之前,重讀每一行被引用的程式碼Re-reads every cited line before showing you anything
- 完全照規格描述去實作Implements exactly what the plan describes
- 執行每個步驟內建的 verification gateRuns the verification gates baked into every step
- 只在可拋棄的 git worktree 內工作Works only inside a disposable git worktree
- 從不需自行判斷是否成功——規格會告訴它Never has to judge whether it succeeded — the plan tells it
一次執行先發散、再收斂,最後產出一批可執行的規格。advisor 平行繪製與稽核、自我查核 findings、依槓桿排序,並為你挑選的項目寫成規格。接著由 execute 與 reconcile 閉環。
A run fans out, narrows down, and ends at a backlog of executable specs. The advisor maps and audits in parallel, vets its own findings, ranks them by leverage, and writes plans for the ones you pick. From there, execute and reconcile close the loop.
Ctrl/Cmd + 滾輪縮放 · 拖曳平移 · 雙擊回到適中 · ⛶ 開啟全尺寸Ctrl/Cmd + wheel to zoom · drag to pan · double-click to fit · ⛶ opens full size
build / test / lint 指令會成為每份規格裡的 verification gate。它會吸收 ADRs、PRDs、CONTEXT.md、DESIGN.md,讓已定案的取捨不會被重新標記為問題。
Build / test / lint commands become verification gates in every plan. Ingests ADRs, PRDs, CONTEXT.md, DESIGN.md so decided tradeoffs aren't re-flagged.
平行 subagents 橫跨九個分類。每個 finding 都帶有 file:line 證據、impact、effort 與 confidence——無一例外。
Parallel subagents across nine categories. Every finding carries file:line evidence, impact, effort, and confidence — no exceptions.
advisor 親自重讀每一個被引用的位置。誤報剔除、錯誤歸因更正、否決紀錄留存,讓它們不會再回來。
The advisor re-reads every cited location itself. False positives get dropped, wrong attributions corrected, rejections recorded so they don't come back.
findings 落入一張以 impact ÷ effort、再用 confidence 加權排序的表。由你選擇哪些寫成規格——沒有任何自動化。
Findings land in a table ordered by impact ÷ effort, weighted by confidence. You choose which become plans — nothing is automatic.
稽核同時橫跨九個分類發散。八個獵捕問題;第九個提出方向(direction)——而且每個建議都必須引用 repo 本身的證據。
The audit fans out across nine categories at once. Eight hunt for problems; the ninth proposes direction — and every suggestion must cite evidence from the repo itself.
邏輯錯誤與潛藏 bug,每個都釘在 file:line 上。
Logic errors and latent bugs, each pinned to a file:line.
弱點與加固缺口。只給憑證的位置——絕不洩漏 secret 值。
Vulnerabilities and hardening gaps. Credential locations only — never the secret values.
熱點,例如 O(n²) 迴圈、重複運算、可避免的配置。
Hotspots like O(n²) loops, redundant work, and avoidable allocations.
regression 可能無聲溜過的缺口。
Gaps where a regression could slip through unnoticed.
重複、副本之間的漂移、已腐化的抽象。
Duplication, drift between copies, and abstractions that have decayed.
過時、有風險、或做到一半值得收尾的升級。
Outdated, risky, or half-finished upgrades worth completing.
build、開發迴圈與日常工具中的摩擦。
Friction in the build, dev loop, and day-to-day tooling.
缺失、過時、或與程式碼矛盾的文件。
Documentation that's missing, stale, or contradicts the code.
這個專案該往哪走——功能建議,每個都立基於 repo 證據。沒有空泛的 idea-slop。
Where to take the project — feature suggestions, each grounded in repo evidence. No generic idea-slop.
一個動詞,多種組合。深度從快速熱點掃描到鉅細靡遺;範圍可限定在 branch 改動;也可直接跳到規劃或執行單一任務。所有變體都可互相組合:deep security、branch --issues 都是合法呼叫。
One verb, many combinations. Audit depth scales from a cheap hotspot pass to an exhaustive sweep; scope can narrow to a branch diff; or skip straight to planning or executing a single thing. All variants compose: deep security, branch --issues are valid invocations.
稽核深度
Audit Depth
| quick | standard (default) | deep | |
|---|---|---|---|
| 覆蓋範圍Coverage | 最高流量+最高風險熱點Highest-churn, highest-criticality hotspots | 熱點優先,關鍵 packagesHotspot-weighted, key packages | 整個 repo,每個 packageWhole repo, every package |
| 並行代理Subagents | 0–1 | ≤4 並行≤4 concurrent | ≤8 並行,每類別一個≤8 concurrent, one per category |
| 搜尋廣度Breadth | 中等"medium" | 正確性+安全性「非常徹底」,其餘「中等」"very thorough" for correctness + security, "medium" rest | 全部「非常徹底」"very thorough" everywhere |
| 稽核分類Categories | correctness · security · tests (3 種)correctness · security · tests (3 of 9) | 全部 9 種all nine | 全部 9 種all nine |
| Findings | 前 ~6,僅 HIGH 信心top ~6, HIGH confidence only | 完整表格full table | 完整 + LOW 信心「待調查」full table + LOW-confidence "investigate" items |
指令參考
Command Reference
執行完整管線:Recon 偵察 → 9 種面向平行稽核(最多 4 個並行代理) → Vet 重讀每行引用後去除誤報 → 依槓桿排序(impact ÷ effort × confidence) → 等你挑選 findings → 寫出規格。預設深度為 standard。
Runs the full pipeline: Recon → parallel audit across all 9 lenses (up to 4 subagents) → Vet (re-reads every cited line, drops false positives) → prioritize by leverage (impact ÷ effort × confidence) → waits for your selection → writes plans. Default depth: standard.
- 讀取 README、CLAUDE.md、根設定檔、CI 設定Reads README, CLAUDE.md, root configs, CI config
- 吸收 ADRs、PRDs、
CONTEXT.md——已定案取捨不會被重新標為問題Ingests ADRs, PRDs,CONTEXT.md— decided tradeoffs won't be re-flagged - 找出精確的 build / test / lint 指令,成為每份規格的 verification gateIdentifies exact build / test / lint commands — these become verification gates in every plan
- 用
git log熱點辨識正在活躍的程式碼Usesgit logchurn to identify actively-changing code vs. frozen
- 等你挑選哪些要寫成規格(不自動批量輸出)Waits for you to pick which become plans — never auto-fires all
- 建議依賴順序——例如「先建立特性測試,再重構」Suggests dependency ordering — e.g. "characterization tests before the refactor"
- 無互動時:自動規劃槓桿前 3–5 項,記錄在
plans/README.mdNon-interactive: plans top 3–5 by leverage, records the default inplans/README.md
quick:只看最高流量、最高風險的熱點——3 種分類、前 ~6 HIGH 信心 findings、0–1 個子代理。適合提交前的快速確認,或首次接觸陌生 repo。deep:整個 repo 全面掃蕩——每個 package、全部 9 種分類、最多 8 個並行代理(每類別一個),包含 LOW 信心的「待調查」項目。適合大規模重構前、新版本發布前,或加入新 repo 時。
quick: only the highest-churn, highest-criticality hotspots — 3 categories, top ~6 HIGH-confidence findings, 0–1 subagent. Good for a pre-commit gut-check or a first pass on an unfamiliar repo.deep: exhaustive full-repo sweep — every package, all 9 categories, up to 8 concurrent subagents (one per category), includes LOW-confidence "investigate" items. Use before a major refactor, a release, or onboarding to a new repo.
quick security— 快速安全性掃描fast security passdeep perf— 徹底的效能審計thorough performance auditdeep --issues— 完整掃描後發佈為 issuesfull sweep, then publish as issues
- 深度修飾詞可附加在任何指令上Depth modifier attaches to any command
- 不論哪個深度,最終報告都會說明未稽核的部分At any depth, the final report names what was not audited
執行 Recon → 只稽核指定的單一面向 → 寫出規格。當你已知哪個領域需要關注,不想等待完整掃描時使用。每個面向名稱對應 9 種稽核分類之一。
Runs Recon → audits only the named category → writes plans. Use when you already know which area needs attention and don't want to wait for a full sweep. Each focus name maps to one of the 9 audit lenses.
security— 弱點、存取控制、憑證衛生vulns, access control, credential hygieneperf— N+1、演算法複雜度、快取缺口N+1, complexity, caching gapstests— 關鍵路徑覆蓋、測試品質critical-path coverage, quality issuesbugs— 邏輯錯誤、Async 危害、資源洩漏logic errors, async hazards, resource leakstech-debt— 重複、層次違反、死碼duplication, layering violations, dead codedeps— 主版本滯後、棄用 API、重複依賴major-version lag, deprecated APIs, duplicatesdx— 慢回饋迴路、入門障礙、工具缺失slow feedback, onboarding friction, missing toolingdocs— 缺失、過時或矛盾的文件missing, stale, or contradictory docs
deep security— 全面安全性稽核exhaustive security auditquick tests— 快速覆蓋缺口掃描fast coverage gap scansecurity --issues— 安全性掃描並發佈 issuessecurity scan + publish issues
把稽核範圍縮限在目前 branch 改動的檔案(自 merge-base 以來)及其直接 importer/caller。每個 finding 標記為 introduced(此 branch 引入)或 pre-existing(被碰到的檔案中的既有問題)——表格分開顯示,不把技術債怪在 PR 作者頭上。開 PR 前的理想選擇。
Scopes the audit to files changed since merge-base with the default branch, plus their direct importers/callers. Every finding is tagged introduced (by this branch) or pre-existing (in touched files, pre-existing debt) — the table separates them so the PR author isn't blamed for legacy issues. Ideal before opening a PR.
- 輕量 Recon、全部 9 種分類、通常不用子代理Light recon, all categories, usually no subagents
- 若在預設 branch 或與上游零距離——說明情況並提供完整稽核If on default branch or zero commits ahead — says so and offers a full audit instead
branch --issues— PR 審查 + 自動建立 issuesPR review + auto-file issuesbranch security— 只看 branch 的安全性 findingssecurity findings for this branch only
只跑 Recon + direction 面向,但更深入:4–6 個有根據的功能或 roadmap 建議,每個都引用 repo 本身的證據(半完成的功能、TODO 叢集、一個介面之遙的 plugin 系統……)。不能有「加上深色模式」這類空泛建議。選定的 findings 寫成設計/探索規格(調查、原型、定義 API、列出開放問題),而非直接施工規格。
Runs Recon + direction lens only, in more depth: 4–6 grounded feature/roadmap suggestions, each citing repo evidence (half-built features, TODO clusters, a plugin system one interface away…). Generic suggestions like "add dark mode" are not allowed. Selected findings become design/spike plans (investigate, prototype, define API, list open questions) — not build-everything plans.
- 未完成意圖:TODO/FIXME 叢集、未完成功能Unfinished intent: TODO/FIXME clusters, half-built features
- 聲稱但未交付:README 承諾、no-op config 選項Stated-but-undelivered: README promises, config options that no-op
- 介面不對稱:有 export 無 import、有 create 無 bulk-createSurface asymmetries: export without import, create without bulk-create
- 鄰近可能性:現有架構讓某件事特別便宜Adjacent possible: existing architecture makes something disproportionately cheap
- 設計/探索規格,而非施工規格Design/spike plan, not a build-everything plan
- 工作量估算較粗——skill 會說明Coarser effort estimate — skill says so explicitly
- 每個建議附 trade-offs 兩三句Each suggestion with trade-offs in 2–3 sentences
跳過稽核。你已知道要做什麼——improve 做一次輕量 Recon,再調查到足以正確規格化這件事,寫出一份自足規格。模糊之處:先嘗試從程式碼庫自行解答;剩下的才問你——一次一個問題,每個附上建議答案。
Skip the audit entirely. You already know what you want. Improve does a lightweight Recon, investigates just enough to specify it properly, and writes one self-contained plan. Ambiguities: first tries to resolve from the codebase; only what's left becomes questions to you — one at a time, each with a recommended answer.
- 已知需要某個快取層、速率限制、認證修改"I know this needs a cache layer / rate limiting / auth change"
- 你有 ticket/issue 描述,要把它轉成可執行規格You have a ticket and want to turn it into an executable plan
- 你不需要完整稽核,只需要一份好規格You don't need a full audit — just a good spec
- 先從程式碼庫自行解答First resolves from codebase itself
- 剩餘疑問一次一個、附建議答案Remaining questions one at a time with recommended answer
- 不會因缺乏資訊就寫出含糊規格Won't write a vague spec for lack of information
對照規格模板的品質標準批判並收緊 plans/ 中的一份現有規格。若 advisor 在當前 session 中撰寫了該規格,還會派生一個全新上下文的子代理冷讀它——因為自我批判會遺漏你腦補填入的缺口,而 executor 是不知道那些背景的。
Critiques and tightens an existing plan in plans/ against the plan-template quality bar. If the advisor authored the plan in this same session, also spawns a fresh-context subagent to read it cold — because self-critique misses gaps you mentally fill from context the executor won't have.
- context 是否全部內嵌(無「如上所述」)Context fully inlined — no "as discussed above"
- 每個 verification 是指令 + 預期結果Every verification is a command + expected result, not a judgment
- STOP 條件針對此規格的真實風險STOP conditions specific to this plan's actual risks
- 規格中無任何 secret 值(只有位置+類型)No secret values anywhere — locations and types only
- 只在同一 session 撰寫規格時觸發Only triggered when plan was authored in this same session
- 用全新上下文讀規格,找 advisor 腦補過的缺口Reads with fresh context to find gaps the advisor mentally filled
在隔離的 git worktree 中派遣一個更便宜的 executor 子代理(預設 sonnet)實作某份規格,接著像 tech lead 一樣審查成果:重跑每個完成條件、檢查範圍、比對 diff 與意圖。Executor 的輸出在審查前是不可信任的。Advisor 本身永遠不合併、不推送。
Dispatches a cheaper executor subagent (default: sonnet) in an isolated git worktree to implement one plan, then reviews the result like a tech lead: re-runs every done criterion, checks scope, reads the diff against intent. Executor output is untrusted until reviewed. The advisor never merges, pushes, or commits to your branch.
- repo 必須是 git 倉庫Repo must be a git repository
- 規格依賴項必須在 index 中標為 DONEPlan dependencies must be DONE in the index
- advisor 自行執行 drift check——不派遣過時的規格Advisor runs the drift check itself — won't dispatch a stale plan
- APPROVE — 標記 DONE,呈現 diff 摘要給你合併mark DONE, present diff summary for you to merge
- REVISE — 發送具體回饋;最多 2 回合send specific feedback; max 2 rounds
- BLOCK — 標記 BLOCKED,收緊規格後重來mark BLOCKED, refine the plan and retry
⚠ 新 worktree 沒有 node_modules/建置產物——executor 必須先安裝依賴。execute 003 haiku 可指定特定模型。有文件記錄的偏差依功能判斷——未記錄的偏差作為審查失敗處理。
⚠ Fresh worktrees lack node_modules/build artifacts — executor must install deps first. execute 003 haiku names a specific model. Documented deviations are judged on merit — undocumented deviations are review failures.
處理上一個 session 以來發生的事。讀取 plans/README.md 和每份規格檔案,根據狀態逐一處理。最後輸出一份簡短報告:哪些已驗證完成、哪些已刷新、哪些已退役、哪些現在可以執行。
Processes what happened since the last session. Reads plans/README.md and every plan file, then handles each by status. Finishes with a short report: what's verified done, refreshed, rejected, and executable right now.
DONE— 抽查完成條件是否仍成立spot-check done criteria still holdBLOCKED— 調查障礙,繞過障礙重寫規格或標記 REJECTEDinvestigate obstacle, rewrite around it or mark REJECTEDIN PROGRESS(過期)(stale) — 提示你注意;executor 可能中途卡死flags to user; executor may have died mid-runTODO— 執行 drift check;finding 已被修掉則標記 REJECTEDruns drift check; marks REJECTED if finding was fixed independently
- 多個 session 後回到有
plans/的 repoReturning to a repo withplans/after multiple sessions - 想知道目前哪些規格可以執行Want to know which plans are executable right now
- 一批規格執行完後,清理 indexAfter a batch of plans executed, to clean up the index
附加在任何規劃指令後,把每份寫好的規格也發佈為 GitHub issue(gh issue create)。Issue URL 記錄在規格的 Status 區塊與 plans/README.md index 中。規格檔案是 source of truth;issue 只是發佈渠道。
Modifier on any planning invocation. Publishes each written plan as a GitHub issue (gh issue create). Issue URL recorded in the plan's Status block and in plans/README.md. Plan file = source of truth; issue = distribution.
gh auth status必須成功must succeed- repo 有 GitHub remoteRepo has a GitHub remote
- 若 repo 是 public——在發佈含安全漏洞的規格前先警告並取得確認If repo is public — warns before publishing security/credential findings; requires explicit confirmation
/improve --issues/improve branch --issues/improve security --issues/improve plan <desc> --issues
規格是為「最弱、但仍合理的 executor」而寫——一個從沒看過 advisor session、甚至小得多的模型。
Plans are written for the weakest plausible executor — a model that never saw the advisor's session, and may be far smaller.
三個特性扛起這份重量。每份規格還會蓋上它所對應的 git commit,讓 executor 在動手前先做一次機械式的 drift check。
Three properties carry that weight. Each plan also stamps the git commit it was written against, so an executor runs a mechanical drift check before touching anything.
所有 context 都內嵌:精確檔案路徑、當前程式碼摘錄、附範例檔的 repo 慣例、已驗證的指令。絕不寫「如上所述」。
All context is inlined: exact file paths, current-state code excerpts, repo conventions with an exemplar file, verified commands. Never "as discussed above."
每個步驟都以一道指令與它的預期輸出作結。完成條件可被機器檢查——executor 從不需自行判斷成功與否。
Every step ends with a command and its expected output. Done criteria are machine-checkable — the executor never has to judge whether it succeeded.
明確的 out-of-scope 清單,以及 STOP 條件——「若發生 X,就停下並回報」——而非放任小模型在現實與規格不符時即興發揮。
Explicit out-of-scope lists and STOP conditions — "if X, stop and report" — instead of letting a small model improvise when reality doesn't match the plan.
規格不是射後不理。三道指令把一個 finding 從規格帶到上線——並讓待辦長期保持誠實。
Plans aren't fire-and-forget. Three commands carry a finding from spec to shipped — and keep the backlog honest over time.
在隔離的 worktree 裡生出一個更便宜的 executor,接著像 tech lead 一樣審查成果——重跑每個完成條件、檢查範圍、比對 diff 與意圖。判決:核准(你合併)、修訂(最多 2 回合)、或 封鎖並收緊規格。
Spawns a cheaper executor in an isolated worktree, then reviews the result like a tech lead — re-runs every done criterion, checks scope, reads the diff against intent. Verdict: approve (you merge), revise (max 2 rounds), or block and refine the plan.
處理這段期間發生的事:驗證 DONE 的規格是否仍成立、調查 BLOCKED 的並繞過障礙重寫、刷新已漂移的、退役已被獨立修掉的 findings。
Processes what happened since: verifies DONE plans still hold, investigates BLOCKED ones and rewrites around the obstacle, refreshes drifted plans, retires findings that got fixed independently.
把規格發佈為 GitHub issues——同樣自足的內容——讓任何 agent 或人都能在工作已存在之處接手。
Publishes plans as GitHub issues — same self-contained body — so any agent or human can pick the work up where it already lives.
護欄不可妥協。advisor 是一個產出規格的唯讀分析者——絕不是鍵盤上的一隻手。
The guardrails are non-negotiable. The advisor is a read-only analyst that produces specs — never a hand on the keyboard.
唯一的寫入是 plans/。executor 只在可拋棄的 worktree 內編輯,而合併永遠由你決定。
The only writes go to plans/. Executors edit only in disposable worktrees, and merging is always yours.
只做讀取、搜尋與唯讀分析。它不會執行任何改變你工作狀態的指令。
Read, search, and read-only analysis only. It won't run commands that change your working state.
只給位置與憑證類型——並一律建議輪替(rotate)。
Locations and credential types only — and rotation is always recommended.
它會指向規格,或改為提供 execute 來派遣更便宜的模型。
It points you at the plan, or offers execute to dispatch a cheaper model instead.
一次對 shadcn/ui 的執行,回報了像這樣的 findings——每一個都經過排序、佐證,隨時可寫成規格。
A run against shadcn/ui came back with findings like these — each one ranked, evidenced, and ready to become a plan.
| # | 發現Finding | 分類Category | 工作量Effort | 信心Confidence |
|---|---|---|---|---|
| 1 | shadow-config 在 search.ts / view.ts 重複;副本已經漂移(search.ts:31 有 TODO)。shadow-config duplicated in search.ts / view.ts; the copies have already drifted (TODO at search.ts:31). |
tech-debt | M | HIGH |
| 2 | migrate-icons.ts:168 的 O(n²) icon migration。O(n²) icon migration at migrate-icons.ts:168. |
perf | S | HIGH |
[SEC-01] https_proxy 環境變數「SSRF」:by-design——標準 proxy 慣例,每個 CLI 都遵循。不算 finding。
[SEC-01] https_proxy env var "SSRF": by-design — standard proxy convention, every CLI honors it. Not a finding.