Ollama – v0.13.5-rc0: GGML update to ec98e2002 (#13451)
Ollama v0.13.5-rc0 just dropped β and itβs all about speed under the hood! π
The GGML inference engine got a major upgrade to commit `ec98e2002`, with smarter, leaner internals:
- β MaskBatchPadding removed β Less padding = less overhead. KQ masking is now cleaner and faster.
- π« NVIDIA Nemotron 3 Nano support paused β Temporarily pulled for stability. Coming back stronger soon!
- π§ Solar Pro tweaks β Under-the-hood adjustments, still being verified. If youβre using Solar, test your models!
No flashy UI β just a lighter, faster engine for local LLM inference. Think of it like swapping your carβs engine for a turbocharged version that runs cooler.
Pro tip: Custom models? Run sanity checks β GGML changes can ripple through quantization and attention layers.
Stay sharp, tinkerers. The local LLM revolution keeps accelerating. π οΈ
