Ollama – v0.13.5-rc0: GGML update to ec98e2002 (#13451)

Ollama – v0.13.5-rc0: GGML update to ec98e2002 (#13451)

Ollama v0.13.5-rc0 just dropped β€” and it’s all about speed under the hood! πŸš€

The GGML inference engine got a major upgrade to commit `ec98e2002`, with smarter, leaner internals:

  • βœ… MaskBatchPadding removed β€” Less padding = less overhead. KQ masking is now cleaner and faster.
  • 🚫 NVIDIA Nemotron 3 Nano support paused β€” Temporarily pulled for stability. Coming back stronger soon!
  • πŸ”§ Solar Pro tweaks β€” Under-the-hood adjustments, still being verified. If you’re using Solar, test your models!

No flashy UI β€” just a lighter, faster engine for local LLM inference. Think of it like swapping your car’s engine for a turbocharged version that runs cooler.

Pro tip: Custom models? Run sanity checks β€” GGML changes can ripple through quantization and attention layers.

Stay sharp, tinkerers. The local LLM revolution keeps accelerating. πŸ› οΈ

πŸ”— View Release