MLX-LM – v0.30.4
MLX LM v0.30.4 just dropped and itβs a beast π
- AWQ/GPTQ weight transforms now live β convert quantized models in one line.
- Nemotron Super 49B v1.5 and GLM4 MoE Lite added β big brains, bigger performance on Apple silicon.
- Batch generation? Fixed. MambaCache, CacheList, IQuestLoopCoder β all smoothed out.
- New continuous batching server benchmark β measure your throughput like a pro.
- LongCat Flash now supports sharding + extended context β longer prompts, zero headaches.
- GPT-OSS & Minimax tensor sharding β distributed inference just got way easier.
- SwiGLU compiled, Falcon H1 embeddings fixed, tokenizer errors now warn instead of crash.
- Huge shoutout to new contributors: Eric, Nikhil, Solarpunkin, Evanev7 & Andrew! π
All powered by the latest MLX + smarter caching. Upgrade, benchmark, and go build something wild.
