Title: Running Massive MoE Models Locally via SSD Streaming | Be...

URL Source: https://www.bestblogs.dev/status/2036294026438254783

Published Time: 2026-03-24 04:08:23

Markdown Content: Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Running Massive MoE Models Locally via SSD Streaming

![Image 2: Simon Willison](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_0238b3) ### Simon Willison

@simonw

Turns out you can run enormous Mixture-of-Experts on Mac hardware without fitting the whole model in RAM by streaming a subset of expert weights from SSD for each generated token - and people keep finding ways to run bigger models

Kimi 2.5 is 1T, but only 32B active so fits 96GB

!Image 3: seikixtc

#### seikixtc

@seikixtc · 7h ago

I got a 1T-parameter model running locally on my MacBook Pro.

LLM: Kimi K2.5

1,026,408,232,448 params (~1.026T)

Hardware: M2 Max MacBook Pro (2023) w/ 96GB unified memory

Running on MLX with a flash-style SSD streaming path + local patching.

This is an experimental setup and I haven’t optimized speed yet, but it’s stable enough that I’ve started testing it in an autoresearch-style loop.

#LocalAI #MLX #MoE Show More

!Image 4: Tweet image

290

79.3K

Mar 24, 2026, 4:08 AM View on X

30 Replies

78 Retweets

919 Likes

60.6K Views ![Image 5: Simon Willison](https://www.bestblogs.dev/en/tweets?sourceid=0238b3) Simon Willison @simonw

One Sentence Summary

Simon Willison highlights a breakthrough technique for running massive Mixture-of-Experts models on consumer Mac hardware by streaming weights from SSD.

Summary

This tweet discusses a significant optimization technique for local LLM inference. By streaming a subset of expert weights from SSD instead of loading the entire model into RAM, users can run massive models, such as the 1T-parameter Kimi 2.5, on devices with limited memory like a MacBook Pro. This approach significantly lowers the hardware barrier for running state-of-the-art models.

AI Score

Influence Score 159

Published At Today

Language

English

Running Massive MoE Models Locally via SSD Streaming | Be...

通过 SSD 流式加载在本地运行超大规模 MoE 模型

Running Massive MoE Models Locally via SSD Streaming

Running Massive MoE Models Locally via SSD Streaming

One Sentence Summary

Summary

Tags

Running Massive MoE Models Locally via SSD Streaming | Be...

🤖 問 AI