Title: Running Massive MoE Models Locally via SSD Streaming | Be...
URL Source: https://www.bestblogs.dev/status/2036294026438254783
Published Time: 2026-03-24 04:08:23
Markdown Content: Skip to main content Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters
⌘K
Change language Switch ThemeSign In
Narrow Mode
Running Massive MoE Models Locally via SSD Streaming
Running Massive MoE Models Locally via SSD Streaming
 ### Simon Willison@simonw
Turns out you can run enormous Mixture-of-Experts on Mac hardware without fitting the whole model in RAM by streaming a subset of expert weights from SSD for each generated token - and people keep finding ways to run bigger models
Kimi 2.5 is 1T, but only 32B active so fits 96GB
#### seikixtc
@seikixtc · 7h ago
I got a 1T-parameter model running locally on my MacBook Pro.
LLM: Kimi K2.5
1,026,408,232,448 params (~1.026T)
Hardware: M2 Max MacBook Pro (2023) w/ 96GB unified memory
Running on MLX with a flash-style SSD streaming path + local patching.
This is an experimental setup and I haven’t optimized speed yet, but it’s stable enough that I’ve started testing it in an autoresearch-style loop.
#LocalAI #MLX #MoE Show More
15
25
290
79.3K
Mar 24, 2026, 4:08 AM View on X
30 Replies
78 Retweets
919 Likes
60.6K Views  Simon Willison @simonw
One Sentence Summary
Simon Willison highlights a breakthrough technique for running massive Mixture-of-Experts models on consumer Mac hardware by streaming weights from SSD.
Summary
This tweet discusses a significant optimization technique for local LLM inference. By streaming a subset of expert weights from SSD instead of loading the entire model into RAM, users can run massive models, such as the 1T-parameter Kimi 2.5, on devices with limited memory like a MacBook Pro. This approach significantly lowers the hardware barrier for running state-of-the-art models.
AI Score
88
Influence Score 159
Published At Today
Language
English
Tags
LocalLLM
MoE
MLX
MacBook
Kimi HomeArticlesPodcastsVideosTweets