Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)

Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)

πŸš€ Ollama v0.13.3-rc1 is live β€” and Apple Silicon users, this one’s for you!

llama.cpp just got a massive upgrade to latest master (17f7f4b), turbocharging SSM models like Granite-4, Jamba, Falcon-H, Nemotron-H, and Qwen3 Next on Metal.

πŸ’₯ What’s new?

  • Prefill speed up by 2–4x on M1/M2/M3 β€” fewer waits, faster first tokens
  • Optimized `SSM_CONV` and `SSM_SCAN` ops β€” the secret sauce behind modern state-space models
  • Clean swap to `gemma3.cpp` (goodbye, -iswa!)
  • 30+ patches + vendored code sync for stability

If you’re running SSMs on Mac β€” upgrade now. Your chat latency just got a serious caffeine boost. 🍏⚑

πŸ”— View Release