Ollama – v0.13.2-rc2: ggml: handle all streams (#13350)
🚀 Ollama v0.13.2-rc2 just dropped — and it’s a quiet win for stability!
The big fix? ggml now handles all GPU/CPU streams properly. No more leaked buffers or misaligned memory. Think of it as finally tidying up your AI workshop so every tensor has its place.
✨ Why you’ll care:
- Smoother inference on multi-GPU setups
- Fewer crashes during heavy async loads
- Better memory cleanup = longer, happier sessions
If you’ve been battling weird memory hiccups with Llama 3 or DeepSeek-R1 on Linux/macOS/Windows — this is your upgrade. Quiet change, huge impact. 💨
Upgrade now and run like a champ.
