Text Generation Webui – v3.18
π₯ text-generation-webui v3.18 is live β and llama.cpp just leveled up!
- π₯οΈ `–cpu-moe` flag dropped β offload MoE experts to CPU and run massive models on low-end GPUs. VRAM? Who needs it.
- π§ ROCm support is HERE! AMD GPU users on Linux β rejoice. No CUDA? No problem.
- π macOS 13 wheels retired. Time to update your OS if youβre still on Big Sur or earlier.
- π Backend upgrades:
- llama.cpp β latest commit (10e9780) β smoother, faster, more stable
- ExLlamaV3 v0.0.15 β better quant, faster attention
- peft 0.18.* β new LoRA magic for fine-tuning lovers
- triton-windows 3.5.1.post21 β Windows inference just got a turbo boost
π¦ Portable builds? Still the best part.
Download β unzip β run. No pip, no install.
- NVIDIA? `cuda12.4`
- AMD/Intel? Use `vulkan`
- CPU-only? `cpubuilds` is your hero
- Mac M1/M2? `macos-arm64` β all set
π§ Upgrading? Just swap the binary. Your `user_data/` folder stays untouched β models, configs, themesβ¦ all safe.
Go run Llama 3 70B MoE on your old laptop. The future isnβt just local β itβs portable. ππ»
