“BS Bench”发布：测试 AI 在面对荒谬问题时的幻觉表现

Title: Introduction of the 'BS Bench': Testing AI Hallucinations...

URL Source: https://www.bestblogs.dev/status/2033710089983660448

Published Time: 2026-03-17 01:00:44

Markdown Content: ![Image 1: Arena.ai](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_39a65f)

Can AI tell when a question is total nonsense, or does it just make up an answer? @petergostev tested 80 models with nonsense questions. Some pushed back. Others confidently invented fake metrics and kept going. All of them were ranked on the "BS Bench".

One surprise: thinking harder made it worse.

Watch the full deep dive on BS Bench on YouTube.

Link in thread.

!Image 2: 视频缩略图

01:06

4 Replies

1 Retweets

30 Likes

2,900 Views ![Image 3: Arena.ai](https://www.bestblogs.dev/en/tweets?sourceid=39a65f)

One Sentence Summary

A new benchmark called 'BS Bench' evaluates 80 AI models on their ability to identify nonsense questions versus confidently inventing fake answers.

Summary

This tweet introduces the 'BS Bench,' a benchmark created by Peter Gostev that tests how 80 different AI models handle nonsense or illogical questions. The study reveals a spectrum of behaviors: some models correctly push back against nonsense, while others 'hallucinate' and invent fake metrics. A notable finding mentioned is that 'thinking harder' (likely referring to Chain-of-Thought or reasoning models) sometimes exacerbated the tendency to fabricate answers rather than identifying the premise as nonsense.

AI Score

Influence Score 7

Published At Today

Language

English

“BS Bench”发布：测试 AI 在面对荒谬问题时的幻觉表现

One Sentence Summary

Summary

Tags

🤖 問 AI