Ollama – v0.15.5
Ollama v0.15.5 just dropped! 🎉
What’s fresh:
- Context‑limit flags for cloud models – New `–context-limit` (and related) CLI options let you cap token windows on hosted Ollama endpoints (OpenAI, Anthropic, etc.). Set it per model in `ollama.yaml` to avoid runaway memory use.
- Sharper error handling – Cloud‑model failures now return a clear “context limit exceeded” message instead of vague timeouts, plus retry logic for flaky network hiccups.
- Performance tweaks – ~10 % faster startup on popular cloud backends and slimmer CPU/GPU memory footprints in “lite” mode.
- Bug fixes & housekeeping – Fixed a race condition that could corrupt logs during parallel jobs, refreshed the OpenAPI schema with the new context params, and added docs with CLI/config examples.
🚀 Pro tip: Pin a sensible `–context-limit` (e.g., 4096) for large‑context LLMs in production to keep costs predictable and dodge OOM crashes.
Happy tinkering! 🎈
