Text Generation Webui – v3.18

Written by

Tater Totterson

in

Text Generation Webui – v3.18

🔥 text-generation-webui v3.18 is live — and llama.cpp just leveled up!

🖥️ `–cpu-moe` flag dropped — offload MoE experts to CPU and run massive models on low-end GPUs. VRAM? Who needs it.
🐧 ROCm support is HERE! AMD GPU users on Linux — rejoice. No CUDA? No problem.
🍎 macOS 13 wheels retired. Time to update your OS if you’re still on Big Sur or earlier.
🚀 Backend upgrades:
llama.cpp → latest commit (10e9780) — smoother, faster, more stable
ExLlamaV3 v0.0.15 — better quant, faster attention
peft 0.18.* — new LoRA magic for fine-tuning lovers
triton-windows 3.5.1.post21 — Windows inference just got a turbo boost

📦 Portable builds? Still the best part.

Download → unzip → run. No pip, no install.

NVIDIA? `cuda12.4`
AMD/Intel? Use `vulkan`
CPU-only? `cpubuilds` is your hero
Mac M1/M2? `macos-arm64` — all set

🔧 Upgrading? Just swap the binary. Your `user_data/` folder stays untouched — models, configs, themes… all safe.

Go run Llama 3 70B MoE on your old laptop. The future isn’t just local — it’s portable. 🎒💻

🔗 View Release

More posts