← 回總覽

ARC-AGI-3 基准测试发布

📅 2026-03-26 01:42 François Chollet 人工智能 3 分鐘 3172 字 評分: 89
ARC-AGI AGI 基准测试 AI 评估 代理智能
📌 一句话摘要 François Chollet 宣布发布 ARC-AGI-3,这是一个旨在通过交互式推理环境评估代理智能的新基准测试。 📝 详细摘要 该推文宣布了 ARC-AGI-3 的发布,这是一个专注于评估代理智能的基准测试。它强调了当前的前沿模型在这些环境中的表现不佳,得分低于 1%,而人类在首次接触时无需预先训练即可解决 100% 的环境。这为 AI 评估建立了一个新的、具有挑战性的标准。 📊 文章信息 AI 评分:89 来源:François Chollet(@fchollet) 作者:François Chollet 分类:人工智能 语言:英文 阅读时间:3 分钟 字数:5

Title: Launch of ARC-AGI-3 Benchmark | BestBlogs.dev

URL Source: https://www.bestblogs.dev/status/2036861192619384989

Published Time: 2026-03-25 17:42:06

Markdown Content: Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Launch of ARC-AGI-3 Benchmark

Launch of ARC-AGI-3 Benchmark

![Image 2: François Chollet](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_fa42b4ed) ### François Chollet

@fchollet

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time.

We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions.

Meanwhile, all frontier AI reasoning models do under 1% at this time.

!Image 3: 视频缩略图

00:15

Mar 25, 2026, 5:42 PM View on X

78 Replies

132 Retweets

1,181 Likes

103.7K Views ![Image 4: François Chollet](https://www.bestblogs.dev/en/tweets?sourceid=fa42b4ed) François Chollet @fchollet

One Sentence Summary

François Chollet announces the release of ARC-AGI-3, a new benchmark designed to evaluate agentic intelligence through interactive reasoning environments.

Summary

This tweet announces the launch of ARC-AGI-3, a benchmark focused on evaluating agentic intelligence. It highlights that current frontier models struggle, scoring under 1% on these environments, whereas humans can solve 100% of them upon first contact without prior training. This establishes a new, challenging standard for AI evaluation.

AI Score

89

Influence Score 235

Published At Today

Language

English

Tags

ARC-AGI

AGI

Benchmarking

AI Evaluation

Agentic AI HomeArticlesPodcastsVideosTweets

Launch of ARC-AGI-3 Benchmark | BestBlogs.dev

查看原文 → 發佈: 2026-03-26 01:42:06 收錄: 2026-03-26 04:00:35

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。