Ollama – v0.13.2-rc2: ggml: handle all streams (#13350)

Ollama – v0.13.2-rc2: ggml: handle all streams (#13350)

🚀 Ollama v0.13.2-rc2 just dropped — and it’s a quiet win for stability!

The big fix? ggml now handles all GPU/CPU streams properly. No more leaked buffers or misaligned memory. Think of it as finally tidying up your AI workshop so every tensor has its place.

✨ Why you’ll care:

  • Smoother inference on multi-GPU setups
  • Fewer crashes during heavy async loads
  • Better memory cleanup = longer, happier sessions

If you’ve been battling weird memory hiccups with Llama 3 or DeepSeek-R1 on Linux/macOS/Windows — this is your upgrade. Quiet change, huge impact. 💨

Upgrade now and run like a champ.

🔗 View Release