Text Generation Webui – v4.5

Text Generation Webui – v4.5

Big news for the local LLM crowd! The legendary text-generation-webui has officially undergone a rebrand and is now known as TextGen! πŸš€ This update brings some much-needed stability and performance tweaks to your local inference workflows.

Here is what’s new in this release:

  • VRAM & Performance Optimization: There is a reduction in peak VRAM usage during prompt logprobs forward passes. If you are running tight hardware setups or trying to squeeze maximum context into your GPU, this is a massive win! 🧠
  • Improved UI/UX:
  • Reading long conversations just got easier with a new sky-blue color for quoted text in light mode.
  • Significant bug fixes prevent chat scrolling from getting stuck on “thinking” blocks and stop tool icons from shrinking during long calls.
  • Critical Bug Fixes:
  • Gemma-4 Tool Calling: Fixed issues with handling double quotes and newline characters in arguments, ensuring much more reliable agentic behavior. πŸ› οΈ
  • Token Management: Resolved issues where BOS/EOS tokens weren’t being set correctly for models lacking chat templates, and fixed duplicate BOS token prepending in ExLlamav3.
  • Under-the-Hood Updates:
  • The project has moved! Find the new home at `github.com/oobabooga/textgen`.
  • Includes the latest versions of `llama.cpp` and `ik_llama.cpp` for better backend support.

If you’ve been tinkering with tool-calling models or struggling with VRAM spikes, this is a must-have update for your local stack! πŸ’»βœ¨

πŸ”— View Release