⌘K
Change language Switch ThemeSign In
Narrow Mode
Skillgrade Released: A Unit Testing Framework for AI Agent Skills =================================================================
Skillgrade Released: A Unit Testing Framework for AI Agent Skills =================================================================  ### meng shao
@shao__meng
Skills Best Practices 作者 @mgechev 最新发布了 Skillgrade:Agent Skills 的单元测试框架
Skillgrade 这个单元测试框架,用来验证 Codex / Claude Code / OpenClaw 等 AI Agents 能否正确发现并使用 Agent Skills 生态中的 Skills(基于 agentskills. io 标准,以 SKILL.md 为入口的指令+资源包)。
项目主要作用
传统提示词/技能迭代依赖人工试错,Skillgrade 提供可量化、可复现、可 CI 集成的评估闭环:
· 混合评分:70% 确定性(代码检查)+ 30% LLM 裁判(工作流质量),加权得出最终通过率
· 沙盒隔离(Docker 默认 / local CI),防止 Agent 误操作
· 一键生成测试(AI init),支持烟雾测试(5 次)、可靠评估(15 次)、回归检测(30 次)
典型使用流程(3 分钟上手)
- 在含 SKILL.md 的 Skills 目录下:skillgrade init(需 API Key,自动生成带任务与 grader 的 eval.yaml)
- 定制 eval.yaml
- skillgrade --smoke(或 --reliable / --regression)运行
- skillgrade preview(CLI)或 preview browser(http://localhost:3847 可视报告)
项目提供的两个示例
· superlint(简易):Agent 须发现自定义 superlint 工具,按“检查→修复→验证” 3 步 workflow 修复 app.js;70% 文件+内容检查,30% LLM 评 workflow 效率。
· angular-modern(进阶):TS grader(regex 静态分析 5 项现代 Angular API 迁移),setup 动态装依赖,剩余 30% LLM 评代码质量——展示复杂 Skills 的精细评分能力。
技术与架构亮点
· CLI + 模板驱动:templates/ 实现 AI init,src/ 核心引擎,graders/ 内置实现。
· 安全+灵活:Docker 构建(base + setup)、资源限、local CI 优化。
· 生态兼容:直接对接 agentskills. io,支持主流 Agents,浏览器 UI 审阅 transcript。
· CI 友好:GitHub Action 示例,--provider=local --ci --threshold=0.8。
开源地址 github.com/mgechev/skillg…Show More
#### Minko Gechev
@mgechev · 24h ago
Announcing skillgrade - the easiest way to evaluate your agent skills
All you need is two commands:
skillgrade init # create evals
skillgrade # run them
By default evals run in a safe sandboxed docker container github.com/mgechev/skillg…Show More
00:56
17
42
325
35.1K
Mar 17, 2026, 12:38 AM View on X
2 Replies
7 Retweets
48 Likes
6,986 Views  meng shao @shao__meng
One Sentence Summary
Skillgrade is a unit testing framework designed for AI Agent Skills, supporting automated evaluation, sandboxed execution, and hybrid scoring.
Summary
Released by Minko Gechev, Skillgrade aims to solve the problem of AI skill iteration relying on manual trial and error. Based on the agentskills.io standard, it provides quantitative evaluation through a hybrid scoring mechanism (70% deterministic code checks + 30% LLM judge for workflow quality). The framework supports Docker sandbox isolation, one-click test generation (AI init), and CI integration. The tweet details its workflow and provides examples like 'superlint' and 'angular-modern'.
AI Score
86
Influence Score 18
Published At Today
Language
Chinese
Tags
Skillgrade
AI Agent
Unit Testing
LLM Eval
Open Source HomeArticlesPodcastsVideosTweets
Skillgrade Released: A Unit Testing Framework for AI Agen... ===============