MLX-LM – v0.30.5
π MLX LM v0.30.5 is live β and itβs a game-changer for Apple Silicon LLM folks!
β OpenAI-compatible `finish_reason` β Drop in MLX LM as a drop-in replacement for OpenAIβs API. No code changes needed.
π§ GLM4-MoE-Lite now caches KV latents β Speed up long convos by skipping redundant attention computations.
π TeleChat3 added! β Tencentβs latest powerhouse model, now fully supported.
π οΈ Kimai tool parser β Smoother plugin integrations for agents and tools.
π§ Activation quantization + QQ ops β Run smaller, faster models with less accuracy loss.
π Fixed logprobs in batch generation β Probabilities finally behave as expected.
π Synced random seeds across distributed ranks β Consistent outputs on multi-GPU setups.
π¦ Transformers bump + ArraysCache fix β Under-the-hood polish for stability and padding.
Big thanks to first-time contributors: @Maanas-Vermas, @percontation, @LuqDaMan, and @lpalbou!
Upgrade now β smoother, faster, more reliable LLM serving on M-series chips. ππ»
