MLX-LM – v0.30.5

MLX-LM – v0.30.5

πŸš€ MLX LM v0.30.5 is live β€” and it’s a game-changer for Apple Silicon LLM folks!

βœ… OpenAI-compatible `finish_reason` β€” Drop in MLX LM as a drop-in replacement for OpenAI’s API. No code changes needed.

🧠 GLM4-MoE-Lite now caches KV latents β€” Speed up long convos by skipping redundant attention computations.

πŸ†• TeleChat3 added! β€” Tencent’s latest powerhouse model, now fully supported.

πŸ› οΈ Kimai tool parser β€” Smoother plugin integrations for agents and tools.

πŸ”§ Activation quantization + QQ ops β€” Run smaller, faster models with less accuracy loss.

🐞 Fixed logprobs in batch generation β€” Probabilities finally behave as expected.

🌐 Synced random seeds across distributed ranks β€” Consistent outputs on multi-GPU setups.

πŸ“¦ Transformers bump + ArraysCache fix β€” Under-the-hood polish for stability and padding.

Big thanks to first-time contributors: @Maanas-Vermas, @percontation, @LuqDaMan, and @lpalbou!

Upgrade now β€” smoother, faster, more reliable LLM serving on M-series chips. πŸπŸ’»

πŸ”— View Release