Ollama – v0.15.0-rc5: llama: fix fattn-tile shared memory overflow on sm_50/52 (#13872)
🚀 Ollama v0.15.0-rc5 just landed — and it’s a quiet hero for legacy GPU folks!
If you’re rocking a GTX 900 series or Titan X (Maxwell, sm_50/52), this update fixes a sneaky shared memory overflow in Flash Attention’s tile kernel. 🛠️
What changed?
- Old: `nthreads=256` + `ncols=4` → blew past 48KB shared mem limit 💥
- New: `nthreads=128` → stays safely under 48KB ✅
No flashy features — just pure, sweet stability. No more OOM crashes during inference on older NVIDIA cards.
Perfect for tinkerers with budget rigs or vintage GPUs who refuse to give up local LLMs. Update, reload your model, and keep grinding! 🖥️🧠
