MLX-LM – v0.30.0
MLX LM v0.30.0 is live 🚀 — Apple Silicon LLMs just got a serious power-up!
- Server performance fixed: No more busy-waiting — idle polling is now lean, quiet, and efficient. 🛠️
- Transformers v5 fully supported: All the latest tokenizer tweaks, model updates, and Hugging Face magic? Covered. 🤖
- MIMO v2 Flash enabled: Multi-input models now fly with optimized attention — faster inference, less latency. ⚡
- Better error messages: Batching failed? Now you’ll know why — no more cryptic crashes. 📢
- Model parallel generation: Split massive models across GPUs like a pro. Scale your LLMs without rewriting code. 🧩
- Chat template fixes: `apply_chat_template` finally wraps correctly — no more dict chaos in your prompts. ✨
Thousands of Hugging Face models, quantized, fine-tuned, and served — all on your M-series chip. Time to upgrade and push your AI stack further. 🚀
