AI 模型无法读取高中教材基本图表：CMU DIAGRAMMA 基准测试揭示关键缺陷

Is it possible that the AI models passing PhD level exams can't read a basic diagram from a high school textbook GPT-4o. Claude. Gemini. All of them tested.

These are the same models your doctors, lawyers, and engineers are using right now.

> CMU built a benchmark called DIAGRAMMA, 1,058 questions about scientific diagrams covering math, computer science, chemistry, and more. Questions that any high school or college student would handle without breaking a sweat.

> They tested 17 models. Every single one failed. The best score in the world, Claude-3.5-Sonnet got 59.64%. That means the most advanced AI on the planet gets 4 in 10 diagram questions wrong.

> GPT-4o: 57%. Gemini-1.5-Pro: 44%. The open-source models most companies are quietly deploying: 35-42%.

> These aren't trick questions. They're asking models to count elements in a graph, identify angles in a triangle, read a project timeline. Things that require actually seeing and reasoning about what's in the image.

> Computer science diagrams, graphs, flowcharts, network structures were the hardest category across every model. The thing AI supposedly excels at. Graphs. It can't read graphs.

→ Best score: Claude-3.5-Sonnet at 59.64%

→ GPT-4o: 57.28% on the same test

→ Smallest open-source models: 35-38%

→ Random guessing baseline: 25% (4 multiple choice options)

→ Cost to generate 100,000 training diagrams to fix this: under $400

The fix CMU built 100,000 synthetic training diagrams generated for less than $400 suggests the problem is solvable. The fact that nobody solved it before now suggests nobody was looking.

AI 模型无法读取高中教材基本图表：CMU DIAGRAMMA 基准测试揭示关键缺陷

🤖 問 AI