MLX-LM – v0.30.4

MLX-LM – v0.30.4

MLX LM v0.30.4 just dropped and it’s a beast πŸš€

  • AWQ/GPTQ weight transforms now live β€” convert quantized models in one line.
  • Nemotron Super 49B v1.5 and GLM4 MoE Lite added β€” big brains, bigger performance on Apple silicon.
  • Batch generation? Fixed. MambaCache, CacheList, IQuestLoopCoder β€” all smoothed out.
  • New continuous batching server benchmark β€” measure your throughput like a pro.
  • LongCat Flash now supports sharding + extended context β€” longer prompts, zero headaches.
  • GPT-OSS & Minimax tensor sharding β€” distributed inference just got way easier.
  • SwiGLU compiled, Falcon H1 embeddings fixed, tokenizer errors now warn instead of crash.
  • Huge shoutout to new contributors: Eric, Nikhil, Solarpunkin, Evanev7 & Andrew! πŸŽ‰

All powered by the latest MLX + smarter caching. Upgrade, benchmark, and go build something wild.

πŸ”— View Release