Ollama – v0.13.1: llm: Don’t always evict models on CPU-only systems
Big win for CPU folks! 🎉 Ollama v0.13.1 just dropped and fixes a major pain point: models no longer get constantly evicted from memory on CPU-only systems. 🐢💻
Before: Ollama thought “no VRAM = always evict,” causing annoying reloads even when RAM was plentiful.
Now: It only evicts when actually needed—like when you’re juggling multiple huge models and RAM is tight.
Result? Smoother, faster inference on laptops, old machines, or cloud instances without GPUs. Load your Llama 3 or Phi-4 once—and let it stay loaded.
Fixes #13227. CPU users, rejoice! 🙌
