← 回總覽

Mamba-3 发布,有望推动混合 AI 架构发展

📅 2026-03-18 10:05 Sebastian Raschka 人工智能 3 分鐘 2991 字 評分: 85
Mamba-3 SSM 混合架构 Transformer RoPE
📌 一句话摘要 Sebastian Raschka 强调了 Mamba-3 的发布,指出它在 Transformer 注意力混合架构(如 Qwen3.5 和 Kimi Linear)中的增强潜力,尤其是在引入 RoPE 之后。 📝 详细摘要 Sebastian Raschka 的这条推文宣布了 Mamba-3 的发布,这是 Mamba 系列线性模型的一次重大更新。Raschka 特别指出,Mamba-3 最有趣的用例体现在近期涌现的 Transformer 注意力混合架构中,例如 Qwen3.5 和 Kimi Linear。他提出,Mamba-3(现已集成旋转位置嵌入 RoPE)有望成为下
![Image 1: Sebastian Raschka](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_4098b97e)

Oh wow, Mamba-3 is here!

For me, the most interesting use case of Mamba and Mamba-likes are the recent transformer attention hybrid architectures (Qwen3.5, Kimi Linear, etc.)

Would be interesting to swap Gated DeltaNet with Mamba-3 (which now also has RoPE) in next gen hybrids.

!Image 2: Tweet image

!Image 3: Albert Gu

#### Albert Gu

@_albertgu · 13h ago

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models.

We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes.

This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

!Image 4: Tweet image

24

209

1,042

206.8K

7 Replies

27 Retweets

238 Likes

12.7K Views ![Image 5: Sebastian Raschka](https://www.bestblogs.dev/en/tweets?sourceid=4098b97e)

One Sentence Summary

Sebastian Raschka highlights the release of Mamba-3, emphasizing its potential to enhance transformer attention hybrid architectures like Qwen3.5 and Kimi Linear, especially with the inclusion of RoPE.

Summary

This tweet by Sebastian Raschka announces the release of Mamba-3, a significant update in the Mamba series of linear models. Raschka specifically points out the most interesting application of Mamba-3 in recent transformer attention hybrid architectures such as Qwen3.5 and Kimi Linear. He suggests that Mamba-3, now incorporating RoPE (Rotary Positional Embeddings), could be a strong candidate to replace Gated DeltaNet in next-generation hybrid models, indicating potential architectural improvements. The quoted tweet from _albertgu, one of Mamba's creators, confirms Mamba-3's arrival, detailing its SSM-centric improvements over Mamba-2 and Gated DeltaNet, and noting its performance gains across all sizes. This release underscores the growing importance of linear models in hybrid AI designs.

AI Score

85

Influence Score 40

Published At Today

Language

English

Tags

Mamba-3

SSM

Hybrid Architectures

Transformer

RoPE

查看原文 → 發佈: 2026-03-18 10:05:18 收錄: 2026-03-18 14:00:41

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。