Is it possible that the AI models passing PhD level exams can't read a basic diagram from a high school textbook GPT-4o. Claude. Gemini. All of them tested.
These are the same models your doctors, lawyers, and engineers are using right now.
> CMU built a benchmark called DIAGRAMMA, 1,058 questions about scientific diagrams covering math, computer science, chemistry, and more. Questions that any high school or college student would handle without breaking a sweat.
> They tested 17 models. Every single one failed. The best score in the world, Claude-3.5-Sonnet got 59.64%. That means the most advanced AI on the planet gets 4 in 10 diagram questions wrong.
> GPT-4o: 57%. Gemini-1.5-Pro: 44%. The open-source models most companies are quietly deploying: 35-42%.
> These aren't trick questions. They're asking models to count elements in a graph, identify angles in a triangle, read a project timeline. Things that require actually seeing and reasoning about what's in the image.
> Computer science diagrams, graphs, flowcharts, network structures were the hardest category across every model. The thing AI supposedly excels at. Graphs. It can't read graphs.
→ Best score: Claude-3.5-Sonnet at 59.64%
→ GPT-4o: 57.28% on the same test
→ Smallest open-source models: 35-38%
→ Random guessing baseline: 25% (4 multiple choice options)
→ Cost to generate 100,000 training diagrams to fix this: under $400
The fix CMU built 100,000 synthetic training diagrams generated for less than $400 suggests the problem is solvable. The fact that nobody solved it before now suggests nobody was looking.