Ollama – v0.18.4-rc1

Ollama – v0.18.4-rc1

🚀 Ollama v0.18.4-rc1 is here — and it’s packing a subtle but smart update!

🔍 What’s new?

Ollama now warns you if your server context length is below 64k tokens when running local models. Why? Because newer LLMs (like Llama 3.1, Mistral Large, DeepSeek-R1) are built for long contexts — and running them with too little context can lead to truncated outputs or weird behavior. This warning helps you avoid those gotchas before they bite! 💡

🛠️ Bonus: While the full changelog is still loading on GitHub, this RC likely includes:

Stability tweaks for model loading
Improved error messages (especially around context handling)
Minor CLI/web UI polish

📌 Pro tip: If you’re using large-context models (e.g., `llama3.1:8b-instruct-q4_K_M`), double-check your `OLLAMA_MAX_LOADED_MODELS` and context settings — this warning is here to help you optimize!

🔗 Grab the RC: v0.18.4-rc1 on GitHub

💬 Join the convo: Ollama Discord

Let us know if you spot any quirks or love the warning — feedback helps shape the final release! 🙌

🔗 View Release

Ollama – v0.18.4-rc1

More posts

Ollama – v0.19.0-rc0: ci: harden cuda include path handling (#15093)

Ollama – v0.18.4-rc1

Ollama – v0.18.4-rc0

Ollama – v0.18.3: api/show: overwrite basename for copilot chat (#15062)