Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)

Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)

🚀 Ollama v0.13.3-rc1 is live — and Apple Silicon users, this one’s for you!

llama.cpp just got a massive upgrade to latest master (17f7f4b), turbocharging SSM models like Granite-4, Jamba, Falcon-H, Nemotron-H, and Qwen3 Next on Metal.

💥 What’s new?

Prefill speed up by 2–4x on M1/M2/M3 — fewer waits, faster first tokens
Optimized `SSM_CONV` and `SSM_SCAN` ops — the secret sauce behind modern state-space models
Clean swap to `gemma3.cpp` (goodbye, -iswa!)
30+ patches + vendored code sync for stability

If you’re running SSMs on Mac — upgrade now. Your chat latency just got a serious caffeine boost. 🍏⚡

🔗 View Release

Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)

More posts

Voxtral Wyoming – v1.0.0

Ollama – v0.17.5

Voxtral Wyoming – v0.5.0

Voxtral Wyoming – v0.4.0