Meta-Harness：实现 AI 智能体外壳工程自动化

📅 2026-03-31 21:13 elvis 人工智能 1 分鐘 1140 字評分: 87

📌 一句话摘要一篇来自斯坦福大学和麻省理工学院的新论文介绍了 Meta-Harness，这是一个能将外壳工程自动化的智能体系统，显著提升了基准测试性能。 📝 详细摘要这条推文重点介绍了一篇来自斯坦福大学和麻省理工学院的研究论文，探讨了“Meta-Harness”，这是一种旨在实现 LLM 外壳工程自动化的智能体系统。通过利用完整的历史记录和先前的执行轨迹，Meta-Harness 在基准测试中实现了 6 倍的性能差距，并在智能体编码任务中超越了手工设计的基线。这种方法将外壳设计视为一个优化问题，为人工搭建脚手架提供了一种可扩展的替代方案。 📊 文章信息 AI 评分：87 来源：elv

NEW Stanford & MIT paper on Model Harnesses. Changing the harness around a fixed LLM can produce a 6x performance gap on the same benchmark.

What if we automated harness engineering itself?

The work introduces Meta-Harness, an agentic system that searches over harness code by exposing the full history through a filesystem.

The proposer reads source code, execution traces, and scores from all prior candidates, referencing over 20 past attempts per step.

On text classification, it improves over SOTA context management by 7.7 points while using 4x fewer tokens.

On agentic coding, it outperforms all hand-engineered baselines on TerminalBench-2, scoring 37.6% versus Claude Code's 27.5%.

This is a big deal! Here is why:

The harness around a model often matters as much as the model itself.

Meta-Harness shows that giving an optimizer rich access to prior experience, not just compressed scores, unlocks automated engineering that beats human-designed scaffolding.

Paper: arxiv.org/abs/2603.28052

Learn to build effective AI agents in our academy: academy.dair.ai

查看原文 → 發佈: 2026-03-31 21:13:10 收錄: 2026-04-01 00:00:18

Meta-Harness：实现 AI 智能体外壳工程自动化

🤖 問 AI