MLX-LM – v0.30.0

MLX-LM – v0.30.0

MLX LM v0.30.0 is live 🚀 — Apple Silicon LLMs just got a serious power-up!

  • Server performance fixed: No more busy-waiting — idle polling is now lean, quiet, and efficient. 🛠️
  • Transformers v5 fully supported: All the latest tokenizer tweaks, model updates, and Hugging Face magic? Covered. 🤖
  • MIMO v2 Flash enabled: Multi-input models now fly with optimized attention — faster inference, less latency. ⚡
  • Better error messages: Batching failed? Now you’ll know why — no more cryptic crashes. 📢
  • Model parallel generation: Split massive models across GPUs like a pro. Scale your LLMs without rewriting code. 🧩
  • Chat template fixes: `apply_chat_template` finally wraps correctly — no more dict chaos in your prompts. ✨

Thousands of Hugging Face models, quantized, fine-tuned, and served — all on your M-series chip. Time to upgrade and push your AI stack further. 🚀

🔗 View Release