• Ollama – v0.20.4: responses: add support for fn call output arrays (#15406)

    Ollama – v0.20.4: responses: add support for fn call output arrays (#15406)

    Ollama Update: v0.20.4 is here! ๐Ÿ› ๏ธ

    If you’re running LLMs locally, listen up! The latest release of Ollama focuses on making function calls much more reliable and expanding how the engine handles complex data structures. No more unexpected crashes when your model tries to return a list instead of a single string.

    Whatโ€™s new in v0.20.4:

    • Enhanced Function Call Support: The engine now supports arrays for function call outputs. Previously, it only handled strings, which caused errors when the output contained multiple pieces of data.
    • Multi-Content Array Support: Beyond just text, the update adds support for arrays containing both image and text content. This makes response handling much more robust for complex, multi-modal tasks. ๐Ÿ–ผ๏ธ
    • Critical Bug Fix: Specifically resolves the `json: cannot unmarshal array into Go struct field` error that was breaking workflows whenever models returned structured list data.

    A quick heads-up for the tinkerers: While this adds great support for text and image content, interleaving (mixing) images and text within these specific arrays is currently limited. Also, keep an eye outโ€”file content support is slated for a future update! ๐Ÿš€

    ๐Ÿ”— View Release

  • Ollama – v0.20.4-rc2: gemma4: Disable FA on older GPUs where it doesn’t work (#15403)

    Ollama – v0.20.4-rc2: gemma4: Disable FA on older GPUs where it doesn’t work (#15403)

    Ollama – v0.20.4-rc2 ๐Ÿš€

    Ollama continues to be the essential toolkit for anyone looking to run large language models locally, providing a seamless way to experiment with privacy and speed on your own hardware.

    This release focuses on improving stability for users running the gemma4 model:

    • Flash Attention (FA) Compatibility Fix: To prevent crashes, Flash Attention is now automatically disabled on older GPU hardware.
    • Hardware Awareness: Specifically, if your CUDA version is older than 7.5, the system will bypass FA since that hardware lacks the necessary support for the gemma4 model.

    This is a great win for those of us working with slightly older gearโ€”you can now deploy these cutting-edge models without worrying about unexpected errors or stability issues! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • MLX-LM – v0.31.2

    MLX-LM – v0.31.2

    MLX LM: Run LLMs with MLX just dropped an update! ๐Ÿš€

    If you’re rocking Apple silicon, this package is your best friend for generating text and fine-tuning large language models using the high-performance MLX framework. Itโ€™s incredibly versatile, supporting quantization, distributed inference, and thousands of models directly from the Hugging Face Hub.

    Whatโ€™s new in mlx-lm v0.31.2:

    This minor release is all about internal stability and refining how the engine handles data during inference:

    • Align batch logits processor token contract: This technical tweak ensures more consistent behavior within the batch logits processor, keeping your token handling predictable and smooth. ๐Ÿ› ๏ธ

    It might be a small adjustment under the hood, but it’s exactly the kind of refinement that keeps local LLM workflows stable for devs and tinkerers alike!

    ๐Ÿ”— View Release

  • Ollama – v0.20.4-rc1: gemma4: add missing file (#15394)

    Ollama – v0.20.4-rc1: gemma4: add missing file (#15394)

    Ollama v0.20.4-rc1 is here! ๐Ÿš€

    If you’ve been trying to run Gemma 4 locally and hitting unexpected errors, this release candidate is exactly what you need to get back up and running smoothly. Ollama remains the premier tool for democratizing LLM access, allowing you to spin up models like Llama 3, DeepSeek-R1, and Mistral directly on your hardware without relying on the cloud.

    Whatโ€™s new in this release:

    • Full Gemma 4 Support: This update resolves a critical issue by adding a missing file required for seamless Gemma 4 integration.
    • Essential Bug Fix: The patch corrects an accidental omission from a previous pull request (#15378), ensuring that the model files are correctly recognized by the framework.

    This is a quick but vital fix to ensure your local AI environment stays stable and capable of running the latest model architectures. Grab it and start tinkering! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.4-rc0

    Ollama – v0.20.4-rc0

    Ollama v0.20.4-rc0 is officially hitting the radar! ๐Ÿš€

    If you’re looking to run powerful LLMs like Llama 3, DeepSeek-R1, or Mistral locally without the headache, Ollama is the ultimate toolkit. It handles everything from model downloading to providing a REST API for your own custom builds, making local AI experimentation incredibly smooth across macOS, Windows, and Linux.

    This latest Release Candidate (rc0) is all about tightening up the experience and ensuring stability before the full rollout. Hereโ€™s whatโ€™s under the hood:

    • Path Cleanup: Experimental paths have been scrubbed to provide a much more predictable environment for your local setups.
    • Enhanced Model Management: Fixed bugs within the “create from existing” functionality, making it easier to build and manage custom model variations.

    Since this is an rc0 release, it’s the perfect time for us tinkerers to jump in, test these refinements, and make sure everything plays nice with our local workflows! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.3: model/parsers: add gemma4 tool call repair (#15374)

    Ollama – v0.20.3: model/parsers: add gemma4 tool call repair (#15374)

    Ollama v0.20.3 is officially live! ๐Ÿš€

    If youโ€™ve been running large language models locally on your machine, you know that Ollama is the gold standard for making LLMs like Llama 3, DeepSeek-R1, and Mistral accessible without needing a massive cloud setup. This latest update is a huge win for anyone building agentic workflows or using tool-calling capabilities.

    Whatโ€™s new in this release:

    • Gemma 4 Tool Call Repair: We’ve all seen itโ€”a model makes a tiny syntax mistake while trying to call a function, and the whole process grinds to a halt. This update introduces a specialized “repair” mechanism for Gemma 4. If the initial strict parse fails, Ollama will now attempt to fix common errors on the fly to keep your automation running smoothly.
    • Smart Error Correction: The new repair logic is specifically tuned to catch and fix:
    • Missing string delimiters.
    • Incorrectly used single-quoted values.
    • Raw terminal strings that need proper formatting according to the tool schema.
    • Missing object closing braces (applied after a successful concrete repair).
    • Improved Stability & Testing: To make sure these fixes don’t cause new headaches, this release includes expanded regression coverage and new unit tests specifically for malformed tool calls.

    This is a massive step forward for reliability when working with cutting-edge models locally. Get updated and keep those agents running! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Ollama – v0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)

    Ollama – v0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)

    Ollama v0.20.3-rc0 is officially live! ๐Ÿš€

    If you are running local LLMs, you know that “agentic” workflows depend entirely on how well a model can call tools and functions. Even a tiny syntax error from the model can crash your entire pipeline. This release is a massive quality-of-life update specifically designed to bridge that gap.

    Whatโ€™s new in this release:

    • Gemma 4 Tool Call Repair: Instead of letting a malformed tool call break your code, Ollama now features a “repair” layer. It uses a candidate pipeline to catch and fix syntax mistakes on the fly.
    • Smart Error Correction: The repair logic is fine-tuned to handle common model hiccups, such as:
    • Missing Gemma string delimiters.
    • Single-quoted string values or dangling delimiters.
    • Raw terminal strings that need proper formatting per the tool schema.
    • Missing object closing braces.
    • Enhanced Stability: This update includes new regression coverage and unit tests to ensure these repair helpers work reliably across various scenarios, preventing old bugs from resurfacing.

    This is a huge win for anyone building autonomous agents or using Gemma 4 for function callingโ€”it makes your local development much more robust and less prone to frustrating crashes! ๐Ÿ› ๏ธ

    ๐Ÿ”— View Release

  • Text Generation Webui – v4.4 – MCP server support!

    Text Generation Webui – v4.4 – MCP server support!

    text-generation-webui (v4.4) ๐Ÿš€

    This powerhouse Gradio web UI is essentially the “AUTOMATIC1111” for Large Language Models, providing a comprehensive local interface to run LLMs via backends like llama.cpp and Transformers. Itโ€™s the go-to tool for anyone wanting a private, offline, and highly customizable way to interact with models.

    The latest update is a massive one, focusing heavily on extensibility and UI polish! Here is what’s new:

    • Remote MCP Server Support: This is a game-changer! You can now connect to remote Model Context Protocol (MCP) servers directly from the Chat tab. The webui will automatically discover and use those tools alongside your local ones, massively expanding what your models can actually do.
    • Modernized UI: The interface has been sleeked up with better contrast, improved scrollbars, and tighter spacing to make your chat experience feel more professional and less cluttered.
    • Gemma 4 Support: Thanks to an updated `ik_llama.cpp` dependency, you can now jump straight into running Gemma 4!
    • Enhanced Image Metadata: For those using the API for image generation, PNG files now include embedded metadata (seed, model, steps, etc.) so your settings are always baked right into the file.
    • Expanded Platform Support: New portable builds are available for Windows users running AMD hardware via ROCm.

    Technical & Developer Notes:

    • API Refinements: Added `instruction_template` parameters to the model load endpoint and cleaned up deprecated settings.
    • Bug Fixes: Resolved critical issues including LaTeX rendering protection, crashes during prompt truncation, and server restart errors.

    ๐Ÿ› ๏ธ Pro-Tip for Tinkerers: If you use a portable installation, you can now move your `user_data` folder one level up (next to the install folder). This allows multiple versions of the webui to share the same models and settings, making updates a total breeze!

    ๐Ÿ”— View Release

  • Lemonade – v10.1.0

    Lemonade – v10.1.0

    The lemonade-sdk/lemonade library has just bumped up to version v10.1.0! ๐Ÿ‹

    If you’re looking to run Large Language Models (LLMs) locally with high performance, Lemonade is your go-to toolkit. It optimizes inference engines to leverage both GPUs and NPUs (like the AMD Ryzen AI series), making local LLM experiences faster and more responsive. Plus, it offers OpenAI API compatibility, so you can swap cloud services for your own hardware without breaking your workflow.

    Whatโ€™s new in this release:

    • Version Bump: The project has officially transitioned to version 10.1.0.
    • Maintenance Update: This release focuses on updating the core project versioning to ensure compatibility and streamlined dependency management for all you tinkerers out there.

    Whether you are using the Python SDK or the CLI, this update helps keep your local environment stable and ready for heavy lifting. Keep those builds running fast! ๐Ÿš€

    ๐Ÿ”— View Release

  • Ollama – v0.20.2

    Ollama – v0.20.2

    Ollama v0.20.2 is officially live! ๐Ÿš€

    If you’re looking to run powerful large language models like Llama 3, DeepSeek-R1, or Mistral locally on your own hardware, Ollama remains the gold standard for making that process seamless and easy. It handles all the heavy lifting of model management so you can focus on tinkering and building.

    This latest release focuses on smoothing out your user experience:

    • Improved App Flow: The default home view has been updated to direct you straight into a new chat session rather than just launching the application interface. This small change helps you jump right into the conversation without extra clicks! ๐Ÿ’ฌ

    Keep those local environments running!

    ๐Ÿ”— View Release