MLX-LM – v0.28.4

MLX-LM – v0.28.4

πŸš€ mlx-lm v0.28.4 is live β€” and it’s a beast!

New models? Oh yeah:

βœ… Minimax-M2, Kimi Linear, Trinity/AfMoE, Ministral3

βœ… DeepSeek V32 β€” now in the fold

βœ… Kimi K2 & OLMo3 fixed for seamless loading

Performance got a turbo boost:

πŸš€ Batching in server mode = faster multi-request handling

πŸ’‘ Multi-prompt cache now holds multiple prompts at once (chat apps, rejoice!)

🧠 DWQ (Dynamic Weight Quantization) β€” run massive models with less memory, same punch

Fixed the niggles:

πŸ”§ Adapter loading typo? Gone.

🧩 parallel_residual now works on GPTNeoX

πŸ“¦ SentencePiece dependency added β€” no more tokenizer fails!

Under the hood:

πŸ”„ Switched to GitHub Actions for smoother CI

πŸ’¬ Better type hints β€” mypy fans, you’re welcome

πŸ§ͺ Flaky tests squashed + LORA fusion now plays nice with non-affine quantization

Big shoutout to new contributors: @jyork03, @spotbot2k, @sriting, @tnadav, @Deekshith-Dade β€” welcome to the crew! πŸŽ‰

Upgrade. Tweak. Crush your next LLM project. πŸ’ͺ

πŸ”— View Release