Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)

Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)

Ollama v0.13.2-rc0 just dropped — and it’s a speed demon 🚀

The big win? ggml updated to b7108, powering faster, leaner LLM inference across the board.

Here’s what’s new:

✅ TopK sampling optimized — smarter token selection, especially on big vocab models.
✅ Metal argsort fixed — M-series chips now run smoother than ever 🍏
✅ Bakllava image-to-text regression patched — multimodal models are back in business.
🚨 Projector metadata warning — if you’re using multimodal GGUF files, double-check your metadata.
⚠️ Vulkan fixes temporarily reverted — stability first, speed later.

This is a release candidate — stable enough for daily use, fresh enough to feel the gains. If you’re on Apple Silicon? This is your upgrade.

Update now and keep those models rolling. 🤖💻

More posts