Ollama – v0.12.9-rc0: ggml: Avoid cudaMemsetAsync during memory fitting
π Ollama v0.12.9-rc0 just dropped β and itβs a quiet hero for GPU warriors!
The secret sauce? `ggml` now skips `cudaMemsetAsync` during memory fitting when it hits invalid pointers.
π‘ Why it rocks:
- No more crashes when checking if your 70B model fits on a 24GB GPU
- Smoother `op_offload` workflows β no more CUDA tantrums during sizing checks
- Faster, more stable memory estimation under pressure
Think of it like silencing a false alarm before you pack your suitcase β no noise, just better packing.
Perfect for folks running Llama 3, DeepSeek-R1, or Mistral on edge GPUs. No reinstall needed β just update and let Ollama handle the heavy lifting. π€β‘
