Skillgrade：AI Agent 技能的单元测试框架发布

Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Skillgrade Released: A Unit Testing Framework for AI Agent Skills =================================================================

Skillgrade Released: A Unit Testing Framework for AI Agent Skills ================================================================= ![Image 2: meng shao](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_65e681) ### meng shao

@shao__meng

Skills Best Practices 作者 @mgechev 最新发布了 Skillgrade：Agent Skills 的单元测试框架

Skillgrade 这个单元测试框架，用来验证 Codex / Claude Code / OpenClaw 等 AI Agents 能否正确发现并使用 Agent Skills 生态中的 Skills（基于 agentskills. io 标准，以 SKILL.md 为入口的指令+资源包）。

项目主要作用

传统提示词/技能迭代依赖人工试错，Skillgrade 提供可量化、可复现、可 CI 集成的评估闭环：

· 混合评分：70% 确定性（代码检查）+ 30% LLM 裁判（工作流质量），加权得出最终通过率

· 沙盒隔离（Docker 默认 / local CI），防止 Agent 误操作

· 一键生成测试（AI init），支持烟雾测试（5 次）、可靠评估（15 次）、回归检测（30 次）

典型使用流程（3 分钟上手）

在含 SKILL.md 的 Skills 目录下：skillgrade init（需 API Key，自动生成带任务与 grader 的 eval.yaml）

定制 eval.yaml

skillgrade --smoke（或 --reliable / --regression）运行

skillgrade preview（CLI）或 preview browser（http://localhost:3847 可视报告）

关键选项支持 --ci（阈值退出）、--parallel、指定 eval/grader/agent/provider，环境变量或 .env 注入 Key，报告自动存至临时目录。

项目提供的两个示例

· superlint（简易）：Agent 须发现自定义 superlint 工具，按“检查→修复→验证” 3 步 workflow 修复 app.js；70% 文件+内容检查，30% LLM 评 workflow 效率。

· angular-modern（进阶）：TS grader（regex 静态分析 5 项现代 Angular API 迁移），setup 动态装依赖，剩余 30% LLM 评代码质量——展示复杂 Skills 的精细评分能力。

技术与架构亮点

· CLI + 模板驱动：templates/ 实现 AI init，src/ 核心引擎，graders/ 内置实现。

· 安全+灵活：Docker 构建（base + setup）、资源限、local CI 优化。

· 生态兼容：直接对接 agentskills. io，支持主流 Agents，浏览器 UI 审阅 transcript。

· CI 友好：GitHub Action 示例，--provider=local --ci --threshold=0.8。

开源地址 github.com/mgechev/skillg…Show More

!Image 3: Tweet image

!Image 4: Minko Gechev

#### Minko Gechev

@mgechev · 24h ago

Announcing skillgrade - the easiest way to evaluate your agent skills

All you need is two commands:

skillgrade init # create evals

skillgrade # run them

By default evals run in a safe sandboxed docker container github.com/mgechev/skillg…Show More

!Image 5: 视频缩略图

00:56

325

35.1K

Mar 17, 2026, 12:38 AM View on X

2 Replies

7 Retweets

48 Likes

6,986 Views ![Image 6: meng shao](https://www.bestblogs.dev/en/tweets?sourceid=65e681) meng shao @shao__meng

One Sentence Summary

Skillgrade is a unit testing framework designed for AI Agent Skills, supporting automated evaluation, sandboxed execution, and hybrid scoring.

Summary

Released by Minko Gechev, Skillgrade aims to solve the problem of AI skill iteration relying on manual trial and error. Based on the agentskills.io standard, it provides quantitative evaluation through a hybrid scoring mechanism (70% deterministic code checks + 30% LLM judge for workflow quality). The framework supports Docker sandbox isolation, one-click test generation (AI init), and CI integration. The tweet details its workflow and provides examples like 'superlint' and 'angular-modern'.

AI Score

Influence Score 18

Published At Today

Language

Chinese

Skillgrade：AI Agent 技能的单元测试框架发布

One Sentence Summary

Summary

Tags

🤖 問 AI