• Ollama – v0.20.4-rc1: gemma4: add missing file (#15394)

    Ollama – v0.20.4-rc1: gemma4: add missing file (#15394)

    Ollama v0.20.4-rc1 is here! ๐Ÿš€

    If you’ve been trying to run Gemma 4 locally and hitting unexpected errors, this release candidate is exactly what you need to get back up and running smoothly. Ollama remains the premier tool for democratizing LLM access, allowing you to spin up models like Llama 3, DeepSeek-R1, and Mistral directly on your hardware without relying on the cloud.

    Whatโ€™s new in this release:

    • Full Gemma 4 Support: This update resolves a critical issue by adding a missing file required for seamless Gemma 4 integration.
    • Essential Bug Fix: The patch corrects an accidental omission from a previous pull request (#15378), ensuring that the model files are correctly recognized by the framework.

    This is a quick but vital fix to ensure your local AI environment stays stable and capable of running the latest model architectures. Grab it and start tinkering! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.4-rc0

    Ollama – v0.20.4-rc0

    Ollama v0.20.4-rc0 is officially hitting the radar! ๐Ÿš€

    If you’re looking to run powerful LLMs like Llama 3, DeepSeek-R1, or Mistral locally without the headache, Ollama is the ultimate toolkit. It handles everything from model downloading to providing a REST API for your own custom builds, making local AI experimentation incredibly smooth across macOS, Windows, and Linux.

    This latest Release Candidate (rc0) is all about tightening up the experience and ensuring stability before the full rollout. Hereโ€™s whatโ€™s under the hood:

    • Path Cleanup: Experimental paths have been scrubbed to provide a much more predictable environment for your local setups.
    • Enhanced Model Management: Fixed bugs within the “create from existing” functionality, making it easier to build and manage custom model variations.

    Since this is an rc0 release, it’s the perfect time for us tinkerers to jump in, test these refinements, and make sure everything plays nice with our local workflows! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.3: model/parsers: add gemma4 tool call repair (#15374)

    Ollama – v0.20.3: model/parsers: add gemma4 tool call repair (#15374)

    Ollama v0.20.3 is officially live! ๐Ÿš€

    If youโ€™ve been running large language models locally on your machine, you know that Ollama is the gold standard for making LLMs like Llama 3, DeepSeek-R1, and Mistral accessible without needing a massive cloud setup. This latest update is a huge win for anyone building agentic workflows or using tool-calling capabilities.

    Whatโ€™s new in this release:

    • Gemma 4 Tool Call Repair: We’ve all seen itโ€”a model makes a tiny syntax mistake while trying to call a function, and the whole process grinds to a halt. This update introduces a specialized “repair” mechanism for Gemma 4. If the initial strict parse fails, Ollama will now attempt to fix common errors on the fly to keep your automation running smoothly.
    • Smart Error Correction: The new repair logic is specifically tuned to catch and fix:
    • Missing string delimiters.
    • Incorrectly used single-quoted values.
    • Raw terminal strings that need proper formatting according to the tool schema.
    • Missing object closing braces (applied after a successful concrete repair).
    • Improved Stability & Testing: To make sure these fixes don’t cause new headaches, this release includes expanded regression coverage and new unit tests specifically for malformed tool calls.

    This is a massive step forward for reliability when working with cutting-edge models locally. Get updated and keep those agents running! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)

    Ollama – v0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)

    Ollama v0.20.3-rc0 is officially live! ๐Ÿš€

    If you are running local LLMs, you know that “agentic” workflows depend entirely on how well a model can call tools and functions. Even a tiny syntax error from the model can crash your entire pipeline. This release is a massive quality-of-life update specifically designed to bridge that gap.

    Whatโ€™s new in this release:

    • Gemma 4 Tool Call Repair: Instead of letting a malformed tool call break your code, Ollama now features a “repair” layer. It uses a candidate pipeline to catch and fix syntax mistakes on the fly.
    • Smart Error Correction: The repair logic is fine-tuned to handle common model hiccups, such as:
    • Missing Gemma string delimiters.
    • Single-quoted string values or dangling delimiters.
    • Raw terminal strings that need proper formatting per the tool schema.
    • Missing object closing braces.
    • Enhanced Stability: This update includes new regression coverage and unit tests to ensure these repair helpers work reliably across various scenarios, preventing old bugs from resurfacing.

    This is a huge win for anyone building autonomous agents or using Gemma 4 for function callingโ€”it makes your local development much more robust and less prone to frustrating crashes! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Text Generation Webui – v4.4 – MCP server support!

    Text Generation Webui – v4.4 – MCP server support!

    text-generation-webui (v4.4) ๐Ÿš€

    This powerhouse Gradio web UI is essentially the “AUTOMATIC1111” for Large Language Models, providing a comprehensive local interface to run LLMs via backends like llama.cpp and Transformers. Itโ€™s the go-to tool for anyone wanting a private, offline, and highly customizable way to interact with models.

    The latest update is a massive one, focusing heavily on extensibility and UI polish! Here is what’s new:

    • Remote MCP Server Support: This is a game-changer! You can now connect to remote Model Context Protocol (MCP) servers directly from the Chat tab. The webui will automatically discover and use those tools alongside your local ones, massively expanding what your models can actually do.
    • Modernized UI: The interface has been sleeked up with better contrast, improved scrollbars, and tighter spacing to make your chat experience feel more professional and less cluttered.
    • Gemma 4 Support: Thanks to an updated `ik_llama.cpp` dependency, you can now jump straight into running Gemma 4!
    • Enhanced Image Metadata: For those using the API for image generation, PNG files now include embedded metadata (seed, model, steps, etc.) so your settings are always baked right into the file.
    • Expanded Platform Support: New portable builds are available for Windows users running AMD hardware via ROCm.

    Technical & Developer Notes:

    • API Refinements: Added `instruction_template` parameters to the model load endpoint and cleaned up deprecated settings.
    • Bug Fixes: Resolved critical issues including LaTeX rendering protection, crashes during prompt truncation, and server restart errors.

    ๐Ÿ› ๏ธ Pro-Tip for Tinkerers: If you use a portable installation, you can now move your `user_data` folder one level up (next to the install folder). This allows multiple versions of the webui to share the same models and settings, making updates a total breeze!

    ๐Ÿ”— View Release

  • Lemonade – v10.1.0

    Lemonade – v10.1.0

    The lemonade-sdk/lemonade library has just bumped up to version v10.1.0! ๐Ÿ‹

    If you’re looking to run Large Language Models (LLMs) locally with high performance, Lemonade is your go-to toolkit. It optimizes inference engines to leverage both GPUs and NPUs (like the AMD Ryzen AI series), making local LLM experiences faster and more responsive. Plus, it offers OpenAI API compatibility, so you can swap cloud services for your own hardware without breaking your workflow.

    Whatโ€™s new in this release:

    • Version Bump: The project has officially transitioned to version 10.1.0.
    • Maintenance Update: This release focuses on updating the core project versioning to ensure compatibility and streamlined dependency management for all you tinkerers out there.

    Whether you are using the Python SDK or the CLI, this update helps keep your local environment stable and ready for heavy lifting. Keep those builds running fast! ๐Ÿš€

    ๐Ÿ”— View Release

  • Ollama – v0.20.2

    Ollama – v0.20.2

    Ollama v0.20.2 is officially live! ๐Ÿš€

    If you’re looking to run powerful large language models like Llama 3, DeepSeek-R1, or Mistral locally on your own hardware, Ollama remains the gold standard for making that process seamless and easy. It handles all the heavy lifting of model management so you can focus on tinkering and building.

    This latest release focuses on smoothing out your user experience:

    • Improved App Flow: The default home view has been updated to direct you straight into a new chat session rather than just launching the application interface. This small change helps you jump right into the conversation without extra clicks! ๐Ÿ’ฌ

    Keep those local environments running!

    ๐Ÿ”— View Release

  • Ollama – v0.20.1: Revert “enable flash attention for gemma4 (#15296)” (#15311)

    Ollama – v0.20.1: Revert “enable flash attention for gemma4 (#15296)” (#15311)

    Ollama v0.20.1 is officially live! ๐Ÿš€

    If you aren’t using Ollama yet, you are missing out on one of the best ways to run powerful Large Language Models (LLMs) like Llama 3, DeepSeek-R1, and Mistral locally on your own hardware. Itโ€™s a total game-changer for privacy-conscious tinkerers and devs who want to experiment with AI without relying on cloud APIs.

    This latest release is a targeted maintenance update focused on stability:

    • Flash Attention Reversion: The team has reverted the “enable flash attention for gemma4” feature. ๐Ÿ”„

    Why does this matter?

    While Flash Attention is an awesome optimization for speed, it looks like the developers decided to pull it back for nowโ€”likely to iron out some unexpected behavior or stability issues specifically with Gemma 4 models.

    If you’ve been experiencing weirdness or crashes while running Gemma 4 with flash attention enabled, updating to v0.20.1 should get your local environment back into a much more predictable and stable state! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Text Generation Webui – v4.3.3 – Gemma 4 support!

    Text Generation Webui – v4.3.3 – Gemma 4 support!

    text-generation-webui just dropped a massive update! If you’re looking for the “AUTOMATIC1111” experience for local LLMs, this Gradio-based powerhouse is now even more capable and snappy. ๐Ÿš€

    Here is the breakdown of whatโ€™s new in this release:

    ๐Ÿง  New Model & Backend Support

    • Gemma 4 Integration: Full support is officially live! You can now run Gemma 4 with full tool-calling capabilities via both the UI and the API.
    • ik_llama.cpp Backend: A brand new backend option has arrived, offering much more accurate KV cache quantization (via Hadamard rotation) and specialized optimizations for MoE models and CPU inference.

    ๐Ÿ› ๏ธ API & Transformer Enhancements

    • Enhanced Completions: The `/v1/completions` endpoint now supports `echo` and `logprobs`, giving you deep visibility into token-level probabilities.
    • Smarter Model Loading: The system now auto-detects `torch_dtype` from model configs, providing way more flexibility than the previous forced half-precision method.
    • Metadata-Driven Templates: Instruction templates are now intelligently detected via model metadata instead of relying on filename patterns.

    โšก Performance & UI Polish

    • Snappier Interface: A custom Gradio fork has been tuned to save up to 50ms per UI event, making the whole experience feel much more responsive.
    • Critical Bug Fixes: Resolved several issues including dropdown crashes, API parsing errors for non-dict JSON tool calls, and `llama.cpp` template parsing bugs.

    ๐Ÿ›ก๏ธ Security & Stability

    • Hardened Protections: Implemented ACL/SSRF fixes for extensions, patched path-matching bypasses on Windows/macOS, and added filename sanitization to prevent manipulation during prompt file operations.

    ๐Ÿ“ฆ Portable Build Upgrades

    New self-contained packages are available for NVIDIA, AMD, Intel, Apple Silicon, and CPU users! Pro tip: You can now move your `user_data` folder one level up to easily share settings across multiple version installs. ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.1-rc2: model/parsers: rework gemma4 tool call handling (#15306)

    Ollama – v0.20.1-rc2: model/parsers: rework gemma4 tool call handling (#15306)

    Ollama v0.20.1-rc2 is officially here, and itโ€™s bringing some serious precision to how your local engine handles model interactions! ๐Ÿ› ๏ธ

    If you’ve been using Ollama to run LLMs like Llama 3, Mistral, or Gemma locally, you know it’s the backbone for building private AI applications. This latest release focuses heavily on refining the way specific models communicate with your system.

    Whatโ€™s new in this release:

    • Gemma4 Tool Call Overhaul: The developers have completely reworked how Gemma4 handles tool calls. By replacing the old custom argument normalizer with a much stricter reference-style conversion, model interactions are now significantly more reliable.
    • Improved Data Integrity: This update is a win for stability! It ensures that quoted strings remain strings, bare keys get properly quoted, and unquoted values maintain their correct types during the JSON unmarshalling process.
    • Enhanced Error Handling: New test coverage has been added to catch malformed raw-quoted inputs. This ensures Ollama behaves exactly like the official reference implementation, reducing those pesky unexpected errors.

    If you are currently experimenting with Gemma4 for agentic workflows or complex tool use, this update is a must-have to make your model interactions more predictable and robust! ๐Ÿš€

    ๐Ÿ”— View Release