Ollama – v0.15.5

Written by

Tater Totterson

in

Ollama – v0.15.5

Ollama v0.15.5 just dropped! 🎉

What’s fresh:

Context‑limit flags for cloud models – New `–context-limit` (and related) CLI options let you cap token windows on hosted Ollama endpoints (OpenAI, Anthropic, etc.). Set it per model in `ollama.yaml` to avoid runaway memory use.
Sharper error handling – Cloud‑model failures now return a clear “context limit exceeded” message instead of vague timeouts, plus retry logic for flaky network hiccups.
Performance tweaks – ~10 % faster startup on popular cloud backends and slimmer CPU/GPU memory footprints in “lite” mode.
Bug fixes & housekeeping – Fixed a race condition that could corrupt logs during parallel jobs, refreshed the OpenAPI schema with the new context params, and added docs with CLI/config examples.

🚀 Pro tip: Pin a sensible `–context-limit` (e.g., 4096) for large‑context LLMs in production to keep costs predictable and dodge OOM crashes.

Happy tinkering! 🎈

🔗 View Release

More posts