Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict
Ollama — v0.15.5‑rc4 (ollamarunner) 🎉
What it does: ollamarunner is the lightweight engine that powers token generation and streaming for Ollama’s local LLMs.
What’s new
- Fixed off‑by‑one bug in `numPredict`
- Previously, setting `numPredict` returned one fewer token than requested and showed the wrong limit in stats.
- The check now runs at the actual prediction step, so you get exactly the number of tokens you ask for, and the stats reflect it accurately.
- Improved batch handling
- Tightened logic around batch termination when hitting token limits.
- Prevents premature batch stops when `numPredict` isn’t used, leading to smoother generation runs.
Why it matters
- Predictable output length → simplifies prompt engineering and downstream processing.
- Accurate stats → better monitoring of usage quotas and performance metrics.
That’s the scoop on this RC—stay tuned for the next stable release! 🚀
