Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting

Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting

πŸš€ Ollama v0.12.9-rc0 just dropped β€” and it’s a quiet hero for GPU warriors!

The secret sauce? `ggml` now skips `cudaMemsetAsync` during memory fitting when it hits invalid pointers.

πŸ’‘ Why it rocks:

  • No more crashes when checking if your 70B model fits on a 24GB GPU
  • Smoother `op_offload` workflows β€” no more CUDA tantrums during sizing checks
  • Faster, more stable memory estimation under pressure

Think of it like silencing a false alarm before you pack your suitcase β€” no noise, just better packing.

Perfect for folks running Llama 3, DeepSeek-R1, or Mistral on edge GPUs. No reinstall needed β€” just update and let Ollama handle the heavy lifting. πŸ€–βš‘

πŸ”— View Release