Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting

Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting

🚀 Ollama v0.12.9-rc0 just dropped — and it’s a quiet hero for GPU warriors!

The secret sauce? `ggml` now skips `cudaMemsetAsync` during memory fitting when it hits invalid pointers.

💡 Why it rocks:

No more crashes when checking if your 70B model fits on a 24GB GPU
Smoother `op_offload` workflows — no more CUDA tantrums during sizing checks
Faster, more stable memory estimation under pressure

Think of it like silencing a false alarm before you pack your suitcase — no noise, just better packing.

Perfect for folks running Llama 3, DeepSeek-R1, or Mistral on edge GPUs. No reinstall needed — just update and let Ollama handle the heavy lifting. 🤖⚡

🔗 View Release

Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting

More posts

Voxtral Wyoming – v1.0.0

Ollama – v0.17.5

Voxtral Wyoming – v0.5.0

Voxtral Wyoming – v0.4.0