Author: Tater Totterson

  • Lemonade – v9.0.8

    Lemonade – v9.0.8

    ๐Ÿš€ Lemonade v9.0.8 just dropped โ€” and itโ€™s a game-changer for local LLM folks!

    • FLM server hostname? Now configurable. No more fighting hardcoded defaults โ€” deploy how you want. ๐ŸŽฏ
    • Override `llama-server` path via env vars โ€” perfect for custom builds, containers, or weird dev setups. ๐Ÿ› ๏ธ
    • CPU backend is LIVE! Run LLMs on CPU without GPU โ€” ideal for dev, testing, or low-power machines. ๐Ÿ–ฅ๏ธ
    • Debate Arena v2 is here! Smarter, smoother multi-model debates with better eval โ€” test personalities like a pro. ๐Ÿ’ฌ๐Ÿง 
    • Huge props to @bitgamm for their first contribution โ€” welcome to the crew! ๐Ÿ‘

    GGUF + ONNX? Check. OpenAI API compat? Check. Windows & Linux? Double check.

    Time to spin up your next local LLM experiment โ€” faster, freer, and more flexible than ever. ๐Ÿš€

    ๐Ÿ”— View Release

  • Ollama – v0.13.2

    Ollama – v0.13.2

    ๐Ÿš€ Ollama v0.13.2-rc0 just dropped โ€” and itโ€™s a quiet hero update!

    โœ… Multi-GPU CUDA setups? Finally detected properly. No more leaving GPUs on the bench.

    ๐Ÿง  DeepSeek-V3.1โ€™s “thinking” mode? Fixed โ€” it wonโ€™t randomly activate when disabled (goodbye, phantom pondering).

    Huge props to our new contributors: ๐Ÿ‘ @chengcheng84 & @nathan-hook โ€” welcome to the crew! First PRs = nailed it.

    Smooth sailing ahead. Update now and run your models faster, cleaner, and with zero GPU drama.

    ๐Ÿ”— Full details: [v0.13.1…v0.13.2-rc0]

    ๐Ÿ”— View Release

  • Lemonade – v9.0.7

    Lemonade – v9.0.7

    ๐Ÿ”ฅ Lemonade v9.0.7 just dropped โ€” and itโ€™s chaos in the best way.

    Introducing Debate Arena: run 8 LLMs at once in your browser and watch them argue like AI philosophers on caffeine. Ministral-3 vs SmolLM3? Phi4 roasting LFM2? Pure digital TED Talk madness.

    โœจ Whatโ€™s new:

    • ๐ŸŽค `llm-debate.html` โ€” drop it in your browser, hit play, and enjoy the AI showdown.
    • ๐Ÿš€ Load up to 8 GGUF models simultaneously with `lemonade-server serve –max-loaded-models 8`.
    • ๐Ÿ› ๏ธ Fixed web publishing, updated deps to GitHubโ€™s latest, and unveiled the Lemonade Manager (Phase 1) โ€” sleeker, faster, smarter.

    ๐Ÿ’ป Grab the `.msi` (Windows) or `.deb` (Linux), fire it up, and let your GPU do the talking.

    No cloud. No limits. Just pure local LLM mayhem. ๐Ÿค–๐Ÿ’ฅ

    Check it out: https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/llm-debate.html

    ๐Ÿ”— View Release

  • Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)

    Ollama – v0.13.2-rc0: ggml update to b7108 (#12992)

    Ollama v0.13.2-rc0 just dropped โ€” and itโ€™s a speed demon ๐Ÿš€

    The big win? ggml updated to b7108, powering faster, leaner LLM inference across the board.

    Hereโ€™s whatโ€™s new:

    • โœ… TopK sampling optimized โ€” smarter token selection, especially on big vocab models.
    • โœ… Metal argsort fixed โ€” M-series chips now run smoother than ever ๐Ÿ
    • โœ… Bakllava image-to-text regression patched โ€” multimodal models are back in business.
    • ๐Ÿšจ Projector metadata warning โ€” if youโ€™re using multimodal GGUF files, double-check your metadata.
    • โš ๏ธ Vulkan fixes temporarily reverted โ€” stability first, speed later.

    This is a release candidate โ€” stable enough for daily use, fresh enough to feel the gains. If youโ€™re on Apple Silicon? This is your upgrade.

    Update now and keep those models rolling. ๐Ÿค–๐Ÿ’ป

    ๐Ÿ”— View Release

  • MLX-LM – v0.28.4

    MLX-LM – v0.28.4

    ๐Ÿš€ mlx-lm v0.28.4 is live โ€” and itโ€™s a beast!

    New models? Oh yeah:

    โœ… Minimax-M2, Kimi Linear, Trinity/AfMoE, Ministral3

    โœ… DeepSeek V32 โ€” now in the fold

    โœ… Kimi K2 & OLMo3 fixed for seamless loading

    Performance got a turbo boost:

    ๐Ÿš€ Batching in server mode = faster multi-request handling

    ๐Ÿ’ก Multi-prompt cache now holds multiple prompts at once (chat apps, rejoice!)

    ๐Ÿง  DWQ (Dynamic Weight Quantization) โ€” run massive models with less memory, same punch

    Fixed the niggles:

    ๐Ÿ”ง Adapter loading typo? Gone.

    ๐Ÿงฉ parallel_residual now works on GPTNeoX

    ๐Ÿ“ฆ SentencePiece dependency added โ€” no more tokenizer fails!

    Under the hood:

    ๐Ÿ”„ Switched to GitHub Actions for smoother CI

    ๐Ÿ’ฌ Better type hints โ€” mypy fans, youโ€™re welcome

    ๐Ÿงช Flaky tests squashed + LORA fusion now plays nice with non-affine quantization

    Big shoutout to new contributors: @jyork03, @spotbot2k, @sriting, @tnadav, @Deekshith-Dade โ€” welcome to the crew! ๐ŸŽ‰

    Upgrade. Tweak. Crush your next LLM project. ๐Ÿ’ช

    ๐Ÿ”— View Release

  • Lemonade – v9.0.6

    Lemonade – v9.0.6

    ๐Ÿš€ Lemonade v9.0.6 just dropped โ€” and itโ€™s a game-changer for local LLM folks!

    Now you can load multiple models at once โ€” LLMs, embeddings, and rerankers โ€” all running in parallel. No more restarting to switch contexts. ๐Ÿค–๐Ÿง 

    โœจ New goodies:

    • Run concurrent requests across models โ†’ smoother, faster workflows
    • Linux logs? Less spam. More chill. ๐Ÿง
    • `run` command now works even if the serverโ€™s already up โ€” no more “port in use” headaches
    • Selective tray unloading keeps RAM sane (bye-bye, memory bloat!)
    • Better docs + venv testing + more robust system info

    Try the live demo: open `examples/demos/multi-model-tester.html` in your browser and juggle 3 models like a pro.

    Perfect for devs running RAG pipelines, local agents, or just tinkering with multiple models side-by-side.

    Full changelog: [v9.0.5…v9.0.6](link)

    ๐Ÿ”— View Release

  • ComfyUI – v0.3.77

    ComfyUI – v0.3.77

    ComfyUI v0.3.77 is live โ€” quiet release, huge quality-of-life wins! ๐Ÿ› ๏ธ

    • Fixed critical crashes when loading workflows with missing or corrupted custom nodes โ€” no more sudden dead ends.
    • Smarter memory management for big image batches, especially on low-GPU setups โ€” less OOM, more generating.
    • Crisp node labels on high-DPI displays (finally, no more blurry text!).
    • Updated deps to patch security gaps and keep the backend rock-solid.

    If custom nodes or memory hiccups have been ruining your flow โ€” update now. No flashy features, just smoother, more stable AI tinkering. ๐Ÿ’ก

    Keep those workflows alive!

    ๐Ÿ”— View Release

  • Wyoming Openai – Groq & Mistral AI Voxtral release (0.3.10)

    Wyoming Openai – Groq & Mistral AI Voxtral release (0.3.10)

    ๐Ÿš€ Wyoming OpenAI v0.3.10 just droppedโ€”and itโ€™s a game-changer for self-hosted voice AI!

    • Groq backend is LIVE ๐ŸŽ‰ โ€” Now plug in Groqโ€™s ultra-fast Whisper STT + PlayAI TTS with `docker-compose.groq.yml`. Free tier? Yes. Zero API keys needed. Low-latency speech, all on your hardware.
    • Mistralโ€™s Voxtral STT just landed! ๐Ÿค– Use `voxtral-mini-latest` with a ready-made `docker-compose.voxtral.yml`. Free, local, and ridiculously accurateโ€”perfect for quiet home assistants.
    • OpenAI client got a polish โœจ โ€” Switched to `omit` for cleaner SDK calls. Fewer bugs, smoother streaming across providers.

    Docker setups? Still there. PyPI install? Yep. Home Assistant integration? Absolutely.

    No more juggling 5 servicesโ€”just one proxy to rule them all.

    Full changelog: v0.3.9…v0.3.10

    Go build your AI voice hub today ๐ŸŽง

    ๐Ÿ”— View Release

  • Ollama – v0.13.1: llm: Don’t always evict models on CPU-only systems

    Ollama – v0.13.1: llm: Don’t always evict models on CPU-only systems

    Big win for CPU folks! ๐ŸŽ‰ Ollama v0.13.1 just dropped and fixes a major pain point: models no longer get constantly evicted from memory on CPU-only systems. ๐Ÿข๐Ÿ’ป

    Before: Ollama thought “no VRAM = always evict,” causing annoying reloads even when RAM was plentiful.

    Now: It only evicts when actually neededโ€”like when youโ€™re juggling multiple huge models and RAM is tight.

    Result? Smoother, faster inference on laptops, old machines, or cloud instances without GPUs. Load your Llama 3 or Phi-4 onceโ€”and let it stay loaded.

    Fixes #13227. CPU users, rejoice! ๐Ÿ™Œ

    ๐Ÿ”— View Release

  • Ollama – v0.13.1-rc2

    Ollama – v0.13.1-rc2

    ๐Ÿš€ Ollama v0.13.1-rc2 just dropped โ€” and itโ€™s a quiet hero for GPU folks!

    No flashy UI changes, but if youโ€™ve ever been crushed by “CUDA error: invalid device function” on older or weird GPUs? This is your win.

    ๐Ÿ”ง Whatโ€™s new:

    • โœ… CUDA Compute Capability validation โ€” Ollama now checks your GPUโ€™s architecture before loading models. No more cryptic crashes on pre-Kepler or niche cards.
    • ๐Ÿ›ก๏ธ Smoother setup for devs on mixed or legacy hardware.
    • ๐Ÿ’ก Under-the-hood polish that saves hours of debugging.

    Perfect if youโ€™re tinkering with Llama 3, DeepSeek-R1, or GGUF models on non-Tesla rigs. Keep those GPUs humming โ€” no more “why wonโ€™t it load?” ๐Ÿ˜Ž๐Ÿ’ป

    ๐Ÿ”— View Release