Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict

Written by

Tater Totterson

in

Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict

Ollama — v0.15.5‑rc4 (ollamarunner) 🎉

What it does: ollamarunner is the lightweight engine that powers token generation and streaming for Ollama’s local LLMs.

What’s new

Fixed off‑by‑one bug in `numPredict`
Previously, setting `numPredict` returned one fewer token than requested and showed the wrong limit in stats.
The check now runs at the actual prediction step, so you get exactly the number of tokens you ask for, and the stats reflect it accurately.

Improved batch handling
Tightened logic around batch termination when hitting token limits.
Prevents premature batch stops when `numPredict` isn’t used, leading to smoother generation runs.

Why it matters

Predictable output length → simplifies prompt engineering and downstream processing.
Accurate stats → better monitoring of usage quotas and performance metrics.

That’s the scoop on this RC—stay tuned for the next stable release! 🚀

🔗 View Release

More posts