Title: Introduction of the 'BS Bench': Testing AI Hallucinations...
URL Source: https://www.bestblogs.dev/status/2033710089983660448
Published Time: 2026-03-17 01:00:44
Markdown Content: 
Can AI tell when a question is total nonsense, or does it just make up an answer? @petergostev tested 80 models with nonsense questions. Some pushed back. Others confidently invented fake metrics and kept going. All of them were ranked on the "BS Bench".
One surprise: thinking harder made it worse.
Watch the full deep dive on BS Bench on YouTube.
Link in thread.
01:06
4 Replies
1 Retweets
30 Likes
2,900 Views 
One Sentence Summary
A new benchmark called 'BS Bench' evaluates 80 AI models on their ability to identify nonsense questions versus confidently inventing fake answers.
Summary
This tweet introduces the 'BS Bench,' a benchmark created by Peter Gostev that tests how 80 different AI models handle nonsense or illogical questions. The study reveals a spectrum of behaviors: some models correctly push back against nonsense, while others 'hallucinate' and invent fake metrics. A notable finding mentioned is that 'thinking harder' (likely referring to Chain-of-Thought or reasoning models) sometimes exacerbated the tendency to fabricate answers rather than identifying the premise as nonsense.
AI Score
84
Influence Score 7
Published At Today
Language
English
Tags
BS Bench
AI Benchmarking
Hallucination
LLM Evaluation
Nonsense Detection