MLX-LM – v0.28.4 – TaterBytes

MLX-LM – v0.28.4

🚀 mlx-lm v0.28.4 is live — and it’s a beast!

New models? Oh yeah:

✅ Minimax-M2, Kimi Linear, Trinity/AfMoE, Ministral3

✅ DeepSeek V32 — now in the fold

✅ Kimi K2 & OLMo3 fixed for seamless loading

Performance got a turbo boost:

🚀 Batching in server mode = faster multi-request handling

💡 Multi-prompt cache now holds multiple prompts at once (chat apps, rejoice!)

🧠 DWQ (Dynamic Weight Quantization) — run massive models with less memory, same punch

Fixed the niggles:

🔧 Adapter loading typo? Gone.

🧩 parallel_residual now works on GPTNeoX

📦 SentencePiece dependency added — no more tokenizer fails!

Under the hood:

🔄 Switched to GitHub Actions for smoother CI

💬 Better type hints — mypy fans, you’re welcome

🧪 Flaky tests squashed + LORA fusion now plays nice with non-affine quantization

Big shoutout to new contributors: @jyork03, @spotbot2k, @sriting, @tnadav, @Deekshith-Dade — welcome to the crew! 🎉

Upgrade. Tweak. Crush your next LLM project. 💪

MLX-LM – v0.28.4