Author: Tater Totterson

Lemonade – v9.0.8
Lemonade – v9.0.8

🚀 Lemonade v9.0.8 just dropped — and it’s a game-changer for local LLM folks!
- FLM server hostname? Now configurable. No more fighting hardcoded defaults — deploy how you want. 🎯
- Override `llama-server` path via env vars — perfect for custom builds, containers, or weird dev setups. 🛠️
- CPU backend is LIVE! Run LLMs on CPU without GPU — ideal for dev, testing, or low-power machines. 🖥️
- Debate Arena v2 is here! Smarter, smoother multi-model debates with better eval — test personalities like a pro. 💬🧠
- Huge props to @bitgamm for their first contribution — welcome to the crew! 👏
GGUF + ONNX? Check. OpenAI API compat? Check. Windows & Linux? Double check.

Time to spin up your next local LLM experiment — faster, freer, and more flexible than ever. 🚀

🔗 View Release
December 5, 2025
Ollama – v0.13.2

Ollama – v0.13.2

🚀 Ollama v0.13.2-rc0 just dropped — and it’s a quiet hero update!

✅ Multi-GPU CUDA setups? Finally detected properly. No more leaving GPUs on the bench.

🧠 DeepSeek-V3.1’s “thinking” mode? Fixed — it won’t randomly activate when disabled (goodbye, phantom pondering).

Huge props to our new contributors: 👏 @chengcheng84 & @nathan-hook — welcome to the crew! First PRs = nailed it.

Smooth sailing ahead. Update now and run your models faster, cleaner, and with zero GPU drama.

🔗 Full details: [v0.13.1…v0.13.2-rc0]

🔗 View Release

December 5, 2025
Lemonade – v9.0.7
Lemonade – v9.0.7

🔥 Lemonade v9.0.7 just dropped — and it’s chaos in the best way.

Introducing Debate Arena: run 8 LLMs at once in your browser and watch them argue like AI philosophers on caffeine. Ministral-3 vs SmolLM3? Phi4 roasting LFM2? Pure digital TED Talk madness.

✨ What’s new:
- 🎤 `llm-debate.html` — drop it in your browser, hit play, and enjoy the AI showdown.
- 🚀 Load up to 8 GGUF models simultaneously with `lemonade-server serve –max-loaded-models 8`.
- 🛠️ Fixed web publishing, updated deps to GitHub’s latest, and unveiled the Lemonade Manager (Phase 1) — sleeker, faster, smarter.
💻 Grab the `.msi` (Windows) or `.deb` (Linux), fire it up, and let your GPU do the talking.

No cloud. No limits. Just pure local LLM mayhem. 🤖💥

Check it out: https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/llm-debate.html

🔗 View Release
December 4, 2025
Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)
Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)

Ollama v0.13.2-rc0 just dropped — and it’s a speed demon 🚀

The big win? ggml updated to b7108, powering faster, leaner LLM inference across the board.

Here’s what’s new:
- ✅ TopK sampling optimized — smarter token selection, especially on big vocab models.
- ✅ Metal argsort fixed — M-series chips now run smoother than ever 🍏
- ✅ Bakllava image-to-text regression patched — multimodal models are back in business.
- 🚨 Projector metadata warning — if you’re using multimodal GGUF files, double-check your metadata.
- ⚠️ Vulkan fixes temporarily reverted — stability first, speed later.
This is a release candidate — stable enough for daily use, fresh enough to feel the gains. If you’re on Apple Silicon? This is your upgrade.

Update now and keep those models rolling. 🤖💻

🔗 View Release
December 4, 2025
MLX-LM – v0.28.4

MLX-LM – v0.28.4

🚀 mlx-lm v0.28.4 is live — and it’s a beast!

New models? Oh yeah:

✅ Minimax-M2, Kimi Linear, Trinity/AfMoE, Ministral3

✅ DeepSeek V32 — now in the fold

✅ Kimi K2 & OLMo3 fixed for seamless loading

Performance got a turbo boost:

🚀 Batching in server mode = faster multi-request handling

💡 Multi-prompt cache now holds multiple prompts at once (chat apps, rejoice!)

🧠 DWQ (Dynamic Weight Quantization) — run massive models with less memory, same punch

Fixed the niggles:

🔧 Adapter loading typo? Gone.

🧩 parallel_residual now works on GPTNeoX

📦 SentencePiece dependency added — no more tokenizer fails!

Under the hood:

🔄 Switched to GitHub Actions for smoother CI

💬 Better type hints — mypy fans, you’re welcome

🧪 Flaky tests squashed + LORA fusion now plays nice with non-affine quantization

Big shoutout to new contributors: @jyork03, @spotbot2k, @sriting, @tnadav, @Deekshith-Dade — welcome to the crew! 🎉

Upgrade. Tweak. Crush your next LLM project. 💪

🔗 View Release

December 3, 2025
Lemonade – v9.0.6
Lemonade – v9.0.6

🚀 Lemonade v9.0.6 just dropped — and it’s a game-changer for local LLM folks!

Now you can load multiple models at once — LLMs, embeddings, and rerankers — all running in parallel. No more restarting to switch contexts. 🤖🧠

✨ New goodies:
- Run concurrent requests across models → smoother, faster workflows
- Linux logs? Less spam. More chill. 🐧
- `run` command now works even if the server’s already up — no more “port in use” headaches
- Selective tray unloading keeps RAM sane (bye-bye, memory bloat!)
- Better docs + venv testing + more robust system info
Try the live demo: open `examples/demos/multi-model-tester.html` in your browser and juggle 3 models like a pro.

Perfect for devs running RAG pipelines, local agents, or just tinkering with multiple models side-by-side.

Full changelog: [v9.0.5…v9.0.6](link)

🔗 View Release
December 3, 2025
ComfyUI – v0.3.77
ComfyUI – v0.3.77

ComfyUI v0.3.77 is live — quiet release, huge quality-of-life wins! 🛠️
- Fixed critical crashes when loading workflows with missing or corrupted custom nodes — no more sudden dead ends.
- Smarter memory management for big image batches, especially on low-GPU setups — less OOM, more generating.
- Crisp node labels on high-DPI displays (finally, no more blurry text!).
- Updated deps to patch security gaps and keep the backend rock-solid.
If custom nodes or memory hiccups have been ruining your flow — update now. No flashy features, just smoother, more stable AI tinkering. 💡

Keep those workflows alive!

🔗 View Release
December 3, 2025
Wyoming Openai – Groq & Mistral AI Voxtral release (0.3.10)
Wyoming Openai – Groq & Mistral AI Voxtral release (0.3.10)

🚀 Wyoming OpenAI v0.3.10 just dropped—and it’s a game-changer for self-hosted voice AI!
- Groq backend is LIVE 🎉 — Now plug in Groq’s ultra-fast Whisper STT + PlayAI TTS with `docker-compose.groq.yml`. Free tier? Yes. Zero API keys needed. Low-latency speech, all on your hardware.
- Mistral’s Voxtral STT just landed! 🤖 Use `voxtral-mini-latest` with a ready-made `docker-compose.voxtral.yml`. Free, local, and ridiculously accurate—perfect for quiet home assistants.
- OpenAI client got a polish ✨ — Switched to `omit` for cleaner SDK calls. Fewer bugs, smoother streaming across providers.
Docker setups? Still there. PyPI install? Yep. Home Assistant integration? Absolutely.

No more juggling 5 services—just one proxy to rule them all.

Full changelog: v0.3.9…v0.3.10

Go build your AI voice hub today 🎧

🔗 View Release
December 3, 2025
Ollama – v0.13.1: llm: Don’t always evict models on CPU-only systems

Ollama – v0.13.1: llm: Don’t always evict models on CPU-only systems

Big win for CPU folks! 🎉 Ollama v0.13.1 just dropped and fixes a major pain point: models no longer get constantly evicted from memory on CPU-only systems. 🐢💻

Before: Ollama thought “no VRAM = always evict,” causing annoying reloads even when RAM was plentiful.

Now: It only evicts when actually needed—like when you’re juggling multiple huge models and RAM is tight.

Result? Smoother, faster inference on laptops, old machines, or cloud instances without GPUs. Load your Llama 3 or Phi-4 once—and let it stay loaded.

Fixes #13227. CPU users, rejoice! 🙌

🔗 View Release

December 2, 2025
Ollama – v0.13.1-rc2
Ollama – v0.13.1-rc2

🚀 Ollama v0.13.1-rc2 just dropped — and it’s a quiet hero for GPU folks!

No flashy UI changes, but if you’ve ever been crushed by “CUDA error: invalid device function” on older or weird GPUs? This is your win.

🔧 What’s new:
- ✅ CUDA Compute Capability validation — Ollama now checks your GPU’s architecture before loading models. No more cryptic crashes on pre-Kepler or niche cards.
- 🛡️ Smoother setup for devs on mixed or legacy hardware.
- 💡 Under-the-hood polish that saves hours of debugging.
Perfect if you’re tinkering with Llama 3, DeepSeek-R1, or GGUF models on non-Tesla rigs. Keep those GPUs humming — no more “why won’t it load?” 😎💻

🔗 View Release
December 2, 2025