Ollama – v0.13.3-rc1: feat: llama.cpp bump (17f7f4) for SSM performance improvements (#13408)
π Ollama v0.13.3-rc1 is live β and Apple Silicon users, this oneβs for you!
llama.cpp just got a massive upgrade to latest master (17f7f4b), turbocharging SSM models like Granite-4, Jamba, Falcon-H, Nemotron-H, and Qwen3 Next on Metal.
π₯ Whatβs new?
- Prefill speed up by 2β4x on M1/M2/M3 β fewer waits, faster first tokens
- Optimized `SSM_CONV` and `SSM_SCAN` ops β the secret sauce behind modern state-space models
- Clean swap to `gemma3.cpp` (goodbye, -iswa!)
- 30+ patches + vendored code sync for stability
If youβre running SSMs on Mac β upgrade now. Your chat latency just got a serious caffeine boost. πβ‘
