MLX-LM – v0.28.4
π mlx-lm v0.28.4 is live β and itβs a beast!
New models? Oh yeah:
β Minimax-M2, Kimi Linear, Trinity/AfMoE, Ministral3
β DeepSeek V32 β now in the fold
β Kimi K2 & OLMo3 fixed for seamless loading
Performance got a turbo boost:
π Batching in server mode = faster multi-request handling
π‘ Multi-prompt cache now holds multiple prompts at once (chat apps, rejoice!)
π§ DWQ (Dynamic Weight Quantization) β run massive models with less memory, same punch
Fixed the niggles:
π§ Adapter loading typo? Gone.
π§© parallel_residual now works on GPTNeoX
π¦ SentencePiece dependency added β no more tokenizer fails!
Under the hood:
π Switched to GitHub Actions for smoother CI
π¬ Better type hints β mypy fans, youβre welcome
π§ͺ Flaky tests squashed + LORA fusion now plays nice with non-affine quantization
Big shoutout to new contributors: @jyork03, @spotbot2k, @sriting, @tnadav, @Deekshith-Dade β welcome to the crew! π
Upgrade. Tweak. Crush your next LLM project. πͺ
