MLX-LM – v0.30.4

Written by

Tater Totterson

in

MLX-LM – v0.30.4

MLX LM v0.30.4 just dropped and it’s a beast 🚀

AWQ/GPTQ weight transforms now live — convert quantized models in one line.
Nemotron Super 49B v1.5 and GLM4 MoE Lite added — big brains, bigger performance on Apple silicon.
Batch generation? Fixed. MambaCache, CacheList, IQuestLoopCoder — all smoothed out.
New continuous batching server benchmark — measure your throughput like a pro.
LongCat Flash now supports sharding + extended context — longer prompts, zero headaches.
GPT-OSS & Minimax tensor sharding — distributed inference just got way easier.
SwiGLU compiled, Falcon H1 embeddings fixed, tokenizer errors now warn instead of crash.
Huge shoutout to new contributors: Eric, Nikhil, Solarpunkin, Evanev7 & Andrew! 🎉

All powered by the latest MLX + smarter caching. Upgrade, benchmark, and go build something wild.

🔗 View Release

More posts