Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict

Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict

Ollama — v0.15.5‑rc4 (ollamarunner) 🎉

What it does: ollamarunner is the lightweight engine that powers token generation and streaming for Ollama’s local LLMs.

What’s new

  • Fixed off‑by‑one bug in `numPredict`
  • Previously, setting `numPredict` returned one fewer token than requested and showed the wrong limit in stats.
  • The check now runs at the actual prediction step, so you get exactly the number of tokens you ask for, and the stats reflect it accurately.
  • Improved batch handling
  • Tightened logic around batch termination when hitting token limits.
  • Prevents premature batch stops when `numPredict` isn’t used, leading to smoother generation runs.

Why it matters

  • Predictable output length → simplifies prompt engineering and downstream processing.
  • Accurate stats → better monitoring of usage quotas and performance metrics.

That’s the scoop on this RC—stay tuned for the next stable release! 🚀

🔗 View Release