← 回總覽

Meta-Harness:实现 AI 智能体外壳工程自动化

📅 2026-03-31 21:13 elvis 人工智能 1 分鐘 1140 字 評分: 87
AI 智能体 Meta-Harness LLM 智能体系统 研究
📌 一句话摘要 一篇来自斯坦福大学和麻省理工学院的新论文介绍了 Meta-Harness,这是一个能将外壳工程自动化的智能体系统,显著提升了基准测试性能。 📝 详细摘要 这条推文重点介绍了一篇来自斯坦福大学和麻省理工学院的研究论文,探讨了“Meta-Harness”,这是一种旨在实现 LLM 外壳工程自动化的智能体系统。通过利用完整的历史记录和先前的执行轨迹,Meta-Harness 在基准测试中实现了 6 倍的性能差距,并在智能体编码任务中超越了手工设计的基线。这种方法将外壳设计视为一个优化问题,为人工搭建脚手架提供了一种可扩展的替代方案。 📊 文章信息 AI 评分:87 来源:elv

NEW Stanford & MIT paper on Model Harnesses. Changing the harness around a fixed LLM can produce a 6x performance gap on the same benchmark.

What if we automated harness engineering itself?

The work introduces Meta-Harness, an agentic system that searches over harness code by exposing the full history through a filesystem.

The proposer reads source code, execution traces, and scores from all prior candidates, referencing over 20 past attempts per step.

On text classification, it improves over SOTA context management by 7.7 points while using 4x fewer tokens.

On agentic coding, it outperforms all hand-engineered baselines on TerminalBench-2, scoring 37.6% versus Claude Code's 27.5%.

This is a big deal! Here is why:

The harness around a model often matters as much as the model itself.

Meta-Harness shows that giving an optimizer rich access to prior experience, not just compressed scores, unlocks automated engineering that beats human-designed scaffolding.

Paper: arxiv.org/abs/2603.28052

Learn to build effective AI agents in our academy: academy.dair.ai

查看原文 → 發佈: 2026-03-31 21:13:10 收錄: 2026-04-01 00:00:18

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。