• Text Generation Webui – v4.5.1

    Text Generation Webui – v4.5.1

    Big news for all the local LLM enthusiasts! The project formerly known as text-generation-webui has officially undergone a massive rebranding to TextGen! 🚀

    This latest update (v4.5.1) is all about stability, UI polish, and critical optimizations for the Gemma 4 model family. Whether you are running heavy quantizations or experimenting with complex tool calling, this release brings essential tweaks under the hood to keep your local inference smooth.

    What’s New in This Release:

    • Identity Shift: The project is now officially TextGen! You can find the updated repository at `github.com/oobabooga/textgen`.
    • Gemma 4 Optimization: Significant fixes for Gemma 4 tool calling, including much better handling of quotes and newlines, plus improved rendering for consecutive “thinking” blocks.
    • VRAM Efficiency: A huge win for GPU users! There is a much-needed reduction in VRAM peak usage during the prompt logprobs forward pass—perfect for squeezing more performance out of your hardware. 🧠
    • UI Enhancements: Added a fresh sky-blue color for quoted text in light mode and improved logits display to make debugging easier.
    • Bug Squashing:
    • Fixed chat scroll issues when interacting with “thinking” blocks.
    • Resolved tool icon SVG shrinking during long tool calls.
    • Fixed various BOS/EOS token issues for models lacking specific chat templates.
    • Dependency Updates: Includes fresh updates for both `llama.cpp` and the `ik_llama.cpp` fork, bringing those awesome new quantization types to your workflow.

    Pro-Tip for Tinkerers: 🛠️

    If you use the portable builds, updating is a breeze! Just download the latest version, extract it, and swap your existing `user_data` folder into the new directory. Even better, since version 4.0, you can place `user_data` one level up next to your install folder so multiple versions can share the same models and settings!

    🔗 View Release

  • Tater – Tater v71

    Tater – Tater v71

    🚀 Tater v71 — “Voice, Evolved” is here! 🎤

    Calling all local-LLM tinkerers and automation wizards! The latest update for Tater—your privacy-first, Hydra-powered AI assistant—has officially landed. This release is a massive leap forward in making voice interactions feel less like a command line and more like a natural conversation. If you’ve been running heavy pipelines for STT/TTS, get ready for a much smoother, unified experience.

    What’s new in v7el:

    • Unified Voice Core: Say goodbye to fragmented services! The Voice Core is now fully merged into the main Tater engine. This means significantly less overhead and a much more stable pipeline for your AI workflows. 🧠
    • Lightning-Fast Latency: We’ve optimized the STT (Speech-to-Text) and TTS (Text-to-Speech) loops. With improved end-of-speech detection and early-start TTS experiments, Tater responds much faster than ever before. ⚡
    • Deep Visibility & Debugging: No more mystery lags! New comprehensive voice metrics and enhanced logging allow you to see exactly what is happening under the hood during STT/TTS transitions. 📊
    • Smart Audio Routing: Take control of your soundscape! Tater is now “location aware,” allowing you to tie specific speakers to different assistants or ensure responses play back precisely on the device you used to speak. 🔊
    • System-Wide Speech Integration: Configuration is now unified via the new `Models` tab. Whether it’s a system announcement, an ESPHome voice device, or a Home Assistant media player, everything uses the same shared logic. One config to rule them all! 🛠️
    • Enhanced ESPHome Power: We’ve deepened the integration further, offering improved entity handling and direct control from the UI. Your ESPHome voice devices are now true system inputs. 🧩

    Everything feels tighter, faster, and more intelligent. If you’re looking to level up your local AI stack, it’s time to give v71 a spin! 🚀

    🔗 View Release

  • Text Generation Webui – v4.5

    Text Generation Webui – v4.5

    Big news for the local LLM crowd! The legendary text-generation-webui has officially undergone a rebrand and is now known as TextGen! 🚀 This update brings some much-needed stability and performance tweaks to your local inference workflows.

    Here is what’s new in this release:

    • VRAM & Performance Optimization: There is a reduction in peak VRAM usage during prompt logprobs forward passes. If you are running tight hardware setups or trying to squeeze maximum context into your GPU, this is a massive win! 🧠
    • Improved UI/UX:
    • Reading long conversations just got easier with a new sky-blue color for quoted text in light mode.
    • Significant bug fixes prevent chat scrolling from getting stuck on “thinking” blocks and stop tool icons from shrinking during long calls.
    • Critical Bug Fixes:
    • Gemma-4 Tool Calling: Fixed issues with handling double quotes and newline characters in arguments, ensuring much more reliable agentic behavior. 🛠️
    • Token Management: Resolved issues where BOS/EOS tokens weren’t being set correctly for models lacking chat templates, and fixed duplicate BOS token prepending in ExLlamav3.
    • Under-the-Hood Updates:
    • The project has moved! Find the new home at `github.com/oobabooga/textgen`.
    • Includes the latest versions of `llama.cpp` and `ik_llama.cpp` for better backend support.

    If you’ve been tinkering with tool-calling models or struggling with VRAM spikes, this is a must-have update for your local stack! 💻✨

    🔗 View Release

  • ComfyUI – v0.19.1

    ComfyUI – v0.19.1

    New update alert for the node-based wizards! 🛠️

    ComfyUI v0.19.1 is officially out. For those of you building complex pipelines, ComfyUI remains the powerhouse node-based GUI designed for advanced Stable Diffusion workflows and highly customized generative AI image generation.

    What’s new in this release:

    • Version Bump: The engine has been updated to v0.19.1.
    • Maintenance & Stability: This incremental update focuses on critical bug fixes and performance optimizations. These types of updates are essential for keeping your custom, heavy-duty workflows running smoothly without crashing mid-render.

    Keep those workflows experimental and your nodes organized! 🚀

    🔗 View Release

  • Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244)

    Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244)

    Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244) just dropped an update! 🚀

    If you’re running local LLMs, especially on Apple Silicon, this release is packed with optimizations to make your models run even smoother.

    What’s new in this release:

    • Gemma 4 Support via MLX: You can now run the Gemma 4 model using the MLX framework (text-only runtime). This is a massive win for Mac users looking to leverage highly optimized performance on Apple hardware! 🍎
    • Enhanced Prefill Speed: The team implemented two clever fixes to accelerate the “prefill” stage (how the model processes your initial prompt) for Gemma 4’s specific architectures:
    • Mask Memoization: The sliding-window prefill mask is now memoized across layers, cutting out redundant calculations.
    • Efficient Softmax: The Router forward pass has been streamlined to perform Softmax only over the specifically selected experts, making the routing process much leaner and faster.

    If you’re tinkering with local AI on a Mac, grab this update to get that extra bit of snappiness in your workflow! 🛠️

    🔗 View Release

  • Ollama – v0.20.7-rc1: Merge pull request #15561 from ollama/drifkin/backport

    Ollama – v0.20.7-rc1: Merge pull request #15561 from ollama/drifkin/backport

    Ollama – v0.20.7-rc1 Update 🚀

    Attention all local LLM tinkerers! A new release candidate for Ollama has just landed, specifically optimized for those of you running Google’s latest open models.

    What’s new in this release:

    • Gemma4 Renderer Enhancements: This update includes critical backported changes specifically for the Gemma4 renderer. 🛠️
    • Improved Stability & Performance: These adjustments optimize how model assets and rendering processes are handled, ensuring much smoother performance when interacting with these specific weights.

    If you’ve been experimenting with the Gemma family of models on your local machine, this targeted update is a must-have to ensure maximum stability and efficiency! 💻✨

    🔗 View Release

  • Ollama – v0.20.7-rc0: gemma4: add nothink renderer tests (#15554)

    Ollama – v0.20.7-rc0: gemma4: add nothink renderer tests (#15554)

    Ollama Update: v0.20.7-rc0 🚀

    If you’re running local LLMs, you know Ollama is the go-to for getting models up and running with zero friction. This latest release candidate focuses on stability and testing for the Gemma 4 integration.

    What’s new in this release:

    • Gemma 4 Enhancements: The update specifically includes new tests for the “nothink” renderer. For those of you building interfaces around Gemma 4, this is a big deal—it helps manage how reasoning processes or thought traces are displayed (or hidden) during model interaction.
    • Improved Reliability: By adding these specific renderer tests, the team is ensuring that model outputs remain consistent and visually correct across different interface implementations.

    Keep tinkering! 🛠️

    🔗 View Release

  • ComfyUI – v0.19.0

    ComfyUI – v0.19.0

    New update alert for ComfyUI! 🚀

    If you’re building complex, node-based pipelines for Stable Diffusion and media generation, it’s time to check out the latest release. This powerhouse engine is getting even more refined for your creative workflows.

    What’s new in v0.19.0:

    • Node Optimization: Refined execution logic designed to make those massive, multi-layered workflows run much smoother and more efficiently.
    • Backend Stability: Key improvements to how the server handles heavy model loading and memory management—perfect for when you’re pushing your hardware to the limit.
    • Compatibility Updates: Essential syncs to ensure everything stays compatible with the latest underlying AI libraries and dependencies.

    Time to pull those updates and keep those custom nodes running strong! 🛠️

    🔗 View Release

  • Tater – Tater v70

    Tater – Tater v70

    🚀 Tater v70 — “Direct Line” is officially live! 🎤

    Get ready, tinkerers! The latest update for Tater—your local-native AI assistant powered by the Hydra planning engine—is a massive leap forward for voice interaction and hardware integration. This release moves away from complex middleman pipelines, allowing Tater to communicate directly with your hardware for a much more seamless experience.

    What’s new in this release:

    • 🔌 Direct ESPHome Connection: You can now connect ESPHome voice devices straight to Tater. The best part? No Home Assistant is required! Just connect and start talking; Tater handles the heavy lifting without needing extra setup hoops or pipeline handoffs.
    • 🔊 Flexible Speaker Routing: Tailor your soundscape by assigning specific speakers to different Tater assistants. This allows you to manage separate rooms with separate voices and outputs.
    • 🗣️ A Fully Customizable Voice Stack: You have total control over how Tater listens and speaks. Mix and match your favorite tools:

    Listening: Choose between the high-accuracy Faster-Whisper or the lightweight/responsive Vosk*.

    Detection: Use Silero VAD* for cleaner audio detection.

    Talking: Pick from Piper (reliable), Kokoro (smooth and natural), or Pocket TTS* (lightweight).

    External Integration: Plug into the Wyoming* ecosystem to run local or remote setups.

    • 💬 Natural, Flowing Conversations: Moving beyond simple “command and response,” Tater can now ask follow-up questions. The conversation stays active without needing a constant re-wake, making interactions feel much more alive.
    • 🧩 Expanded Device Awareness: When Tater connects to a device, he sees more than just audio. He can now monitor device states, sensors, and controls, laying the groundwork for advanced automation.

    🛠️ For the Devs & Hardware Hackers:

    This release is a game-changer for anyone running Voice PE, satellite, or ESPHome devices. Because you no longer need Home Assistant as a middleman, even the most lightweight, standalone setups are now fair game for Tater’s intelligence.

    Go ahead… say something! 🥔🚀

    🔗 View Release

  • Ollama – v0.20.6

    Ollama – v0.20.6

    Ollama just dropped a quick patch, v0.20.6! 🛠️

    If you’re running local LLMs, this tiny update focuses on refining how the model handles specific formatting. Specifically, it makes the Gemma implementation a bit more relaxed regarding whitespace before bare keys. This should help prevent unexpected parsing errors when your prompts or configurations involve tricky spacing.

    What’s new in v0.20.6:

    • Gemma Refinement: Reduced strictness for whitespace preceding bare keys to ensure smoother model performance and configuration loading. 📉✨

    🔗 View Release