MLX-LM – v0.30.0
MLX LM v0.30.0 is live π β Apple Silicon LLMs just got a serious power-up!
- Server performance fixed: No more busy-waiting β idle polling is now lean, quiet, and efficient. π οΈ
- Transformers v5 fully supported: All the latest tokenizer tweaks, model updates, and Hugging Face magic? Covered. π€
- MIMO v2 Flash enabled: Multi-input models now fly with optimized attention β faster inference, less latency. β‘
- Better error messages: Batching failed? Now youβll know why β no more cryptic crashes. π’
- Model parallel generation: Split massive models across GPUs like a pro. Scale your LLMs without rewriting code. π§©
- Chat template fixes: `apply_chat_template` finally wraps correctly β no more dict chaos in your prompts. β¨
Thousands of Hugging Face models, quantized, fine-tuned, and served β all on your M-series chip. Time to upgrade and push your AI stack further. π
