MLX-LM – v0.30.5 – TaterBytes

MLX-LM – v0.30.5

🚀 MLX LM v0.30.5 is live — and it’s a game-changer for Apple Silicon LLM folks!

✅ OpenAI-compatible `finish_reason` — Drop in MLX LM as a drop-in replacement for OpenAI’s API. No code changes needed.

🧠 GLM4-MoE-Lite now caches KV latents — Speed up long convos by skipping redundant attention computations.

🆕 TeleChat3 added! — Tencent’s latest powerhouse model, now fully supported.

🛠️ Kimai tool parser — Smoother plugin integrations for agents and tools.

🔧 Activation quantization + QQ ops — Run smaller, faster models with less accuracy loss.

🐞 Fixed logprobs in batch generation — Probabilities finally behave as expected.

🌐 Synced random seeds across distributed ranks — Consistent outputs on multi-GPU setups.

📦 Transformers bump + ArraysCache fix — Under-the-hood polish for stability and padding.

Big thanks to first-time contributors: @Maanas-Vermas, @percontation, @LuqDaMan, and @lpalbou!

Upgrade now — smoother, faster, more reliable LLM serving on M-series chips. 🍏💻

MLX-LM – v0.30.5