Title: Redis Author's Benchmark: GPT 5.4 Far Outperforms Claude ...
URL Source: https://www.bestblogs.dev/status/2041510879506903270
Published Time: 2026-04-07 13:38:17
Markdown Content: Skip to main content Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters
⌘K
Change language Switch ThemeSign In
Narrow Mode
Redis Author's Benchmark: GPT 5.4 Far Outperforms Claude 4.6 in Complex Reverse Engineering
Redis Author's Benchmark: GPT 5.4 Far Outperforms Claude 4.6 in Complex Reverse Engineering
 ### Viking@vikingmute
很值得看的一个帖子:Redis 作者分享的一个真实实验对比
过去一周,他用 Claude Code Opus 4.6 和 Codex GPT 5.4(max thinking)进行了长时间的自主运行,在独立的目录环境中反复测试。
任务非常复杂,从一个早期90年代的 Unix 磁盘镜像,反向工程早已消失的 SCSI 控制器及其集成 ROM。这是为了计算机历史和博物馆合作的项目,需要结合硬件知识、汇编/反汇编等深度工程能力。
实验结果:
GPT 5.4 :在多次长时间运行中取得了所有主要进展,能有效混合硬件知识、反汇编技巧等,完成复杂逆向工作。
Claude Opus 4.6:只取得了少量次要进展,在高难度任务上几乎一点用都没有。
他的结论:对于高难度的工程工作,两者差距非常残酷。GPT 5.4 明显更强,尤其在需要深度推理和长时程任务时。
原帖中还有对比图。Show More
#### antirez
@antirez · 18h ago
During the last week I executed very long autonomous sessions of Claude Code Opus 4.6 and Codex GPT 5.4 (both at max thinking budget), in cloned directories (refreshed every time one was behind). I burned a lot of (flat rate, my OSS free account + my PRO account) of tokens...
49
119
1,617
545.2K
Apr 7, 2026, 1:38 PM View on X
14 Replies
16 Retweets
166 Likes
32.3K Views  Viking @vikingmute
One Sentence Summary
Redis creator antirez conducted a reverse engineering experiment on a 90s Unix disk image, proving GPT 5.4's deep reasoning capabilities are significantly superior to Claude 4.6.
Summary
The tweet recounts a real-world experimental comparison by antirez, the author of Redis. The task involved reverse engineering an early 90s Unix disk image to reconstruct long-lost SCSI controllers and integrated ROMs—a task requiring extreme engineering skills in hardware knowledge and assembly/disassembly. Results showed that GPT 5.4 (max thinking) achieved all major progress during long autonomous sessions, whereas Claude Opus 4.6 was almost useless for such high-difficulty tasks. The conclusion highlights a brutal gap between the two in deep reasoning and complex engineering fields.
AI Score
88
Influence Score 51
Published At Today
Language
Chinese
Tags
GPT 5.4
Claude 4.6
Redis Author
Model Evaluation
Reverse Engineering HomeArticlesPodcastsVideosTweets