Ollama – v0.15.1-rc1
π Ollama v0.15.1-rc1 just dropped β and itβs a quiet powerhouse!
GLM4-MoE-Lite now quantizes more tensors to Q8_0 β smaller footprint, faster inference, same brainpower. Perfect for laptops, Raspberry Pis, or any edge device running low on RAM.
And goodbye, weird double BOS tokens! π No more repetitive beginnings β your outputs are now cleaner and smoother.
This is a release candidate, so itβs stable but still being polished. If youβre running GLM4-MoE-Lite or just want leaner, faster models β update now and feel the difference.
π§ Pro tip: Q8_0 = less memory, same genius.
