Ollama – v0.15.1-rc1
🚀 Ollama v0.15.1-rc1 just dropped — and it’s a quiet powerhouse!
GLM4-MoE-Lite now quantizes more tensors to Q8_0 → smaller footprint, faster inference, same brainpower. Perfect for laptops, Raspberry Pis, or any edge device running low on RAM.
And goodbye, weird double BOS tokens! 🎉 No more repetitive beginnings — your outputs are now cleaner and smoother.
This is a release candidate, so it’s stable but still being polished. If you’re running GLM4-MoE-Lite or just want leaner, faster models — update now and feel the difference.
🧠 Pro tip: Q8_0 = less memory, same genius.
