Category: AI

AI Releases

Ollama – v0.15.5
Ollama – v0.15.5

Ollama v0.15.5 just dropped! 🎉

What’s fresh:
- Context‑limit flags for cloud models – New `–context-limit` (and related) CLI options let you cap token windows on hosted Ollama endpoints (OpenAI, Anthropic, etc.). Set it per model in `ollama.yaml` to avoid runaway memory use.
- Sharper error handling – Cloud‑model failures now return a clear “context limit exceeded” message instead of vague timeouts, plus retry logic for flaky network hiccups.
- Performance tweaks – ~10 % faster startup on popular cloud backends and slimmer CPU/GPU memory footprints in “lite” mode.
- Bug fixes & housekeeping – Fixed a race condition that could corrupt logs during parallel jobs, refreshed the OpenAPI schema with the new context params, and added docs with CLI/config examples.
🚀 Pro tip: Pin a sensible `–context-limit` (e.g., 4096) for large‑context LLMs in production to keep costs predictable and dodge OOM crashes.

Happy tinkering! 🎈

🔗 View Release
February 6, 2026
Ollama – v0.15.5-rc5
Ollama – v0.15.5-rc5

Ollama v0.15.5‑rc5 – fresh off the press! 🚀

What’s the buzz?

A lightweight framework for running LLMs (Llama 3, Gemma, Mistral, etc.) locally—now even smoother to spin up.

New goodies in this release
- Launch command overhaul
`ollama launch` is faster, logs cleaner, and handles missing model files gracefully.
- Sharper error messages
When a model can’t be found or the GPU runtime fails, you’ll get actionable hints instead of cryptic dumps.
- Cross‑platform tweaks
Minor fixes for macOS ARM builds & Linux containers—no more “permission denied” crashes during startup.
- Telemetry opt‑out flag
Add `–no-telemetry` to suppress anonymous usage reporting. Perfect for privacy‑first setups or CI pipelines.
- Dependency bump
Updated protobuf & ggml libraries shave ~5 % off memory overhead for large models.
- CLI consistency fixes
`ollama run`, `ollama serve`, and `ollama pull` now share the same flag syntax (`–model`, `–port`, etc.), making scripting a breeze.

> Tip: When automating model serving, sprinkle in `–no-telemetry` to keep your logs tidy and respect privacy.

That’s it—speedier launches, clearer feedback, and a handful of quality‑of‑life tweaks for all you local‑LLM tinkers. Happy experimenting! 🎉

🔗 View Release
February 6, 2026
Lemonade – v9.3.0: Hardcode `lib` for systemd service under $PREFIX (#1039)
Lemonade – v9.3.0: Hardcode `lib` for systemd service under $PREFIX (#1039)

Lemonade v9.3.0 – “Hardcode `lib` for systemd service under $PREFIX” 🍋

What Lemonade does:

Run LLMs locally on your PC with GPU/NPU acceleration, OpenAI‑compatible endpoints, and a handy Python SDK/CLI—perfect for privacy‑first devs.

—

What’s new in v9.3.0
- Fixed library path – the systemd daemon now hard‑codes the `lib` directory inside `$PREFIX`. No more “cannot locate lib” errors on Linux.
- Closes Issue #1038 – eliminates the runtime crash that some users hit with custom install locations.
- GPG‑signed release (`B5690EEEBB952194`) for added integrity verification.
Why you’ll love it
- 🎯 Drop the package, enable the service, and it just works—no fiddling with `LD_LIBRARY_PATH` or wrapper scripts.
- 🚀 More reliable on CI/CD pipelines, edge devices, and any environment where env vars are volatile.
Upgrade now: pull tag `v9.3.0` and enjoy a smoother, plug‑and‑play LLM serving experience!

🔗 View Release
February 5, 2026
Tater – Tater v53
Tater – Tater v53

Tater v53 just dropped – fresh features, tighter integrations, and smoother ops! 🎉

What’s new?
- Plugin explosion: Hundreds of ready‑to‑go plugins now ship with Tater. From ComfyUI image generation to web page summarizers and Home Assistant device control, you can extend the assistant in a snap.
- RSS Feed Watcher 2.0: Built‑in RSS monitoring lets you add, list, and manage feeds directly from chat. Push updates automatically to Discord, Telegram, or WordPress – perfect for staying on top of news without leaving your favorite platform. 📡
- Telegram gets full treatment:
- DM safety gate – set `IfAllowed DM User` to whitelist who can DM Tater; leave it empty and all DMs are ignored.
- Shared queue – Telegram now shares the same media‑handling queue as Discord, Matrix, and IRC, so files flow without hiccups.
- Discord‑style plugins work – `typing()`, `channel.send()`, `send_message`, and `ai_tasks` all route to Telegram channels just like they do on Discord.
- Installation flexibility: Choose between a classic Python + Redis local setup or spin up the prebuilt Docker image for one‑click deployment.
- Model recommendation: Gemma3‑27b‑abliterated is now flagged as the go‑to model for best performance out of the box.
What’s still pending?

`ftp_browser` and `webdav_browser` remain Discord‑only for now.

Dive into the README to get started, and let your bots roam free! 🚀

🔗 View Release
February 5, 2026
Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict
Ollama – v0.15.5-rc4: ollamarunner: Fix off by one error with numPredict

Ollama — v0.15.5‑rc4 (ollamarunner) 🎉

What it does: ollamarunner is the lightweight engine that powers token generation and streaming for Ollama’s local LLMs.

What’s new
- Fixed off‑by‑one bug in `numPredict`
- Previously, setting `numPredict` returned one fewer token than requested and showed the wrong limit in stats.
- The check now runs at the actual prediction step, so you get exactly the number of tokens you ask for, and the stats reflect it accurately.
- Improved batch handling
- Tightened logic around batch termination when hitting token limits.
- Prevents premature batch stops when `numPredict` isn’t used, leading to smoother generation runs.
Why it matters
- Predictable output length → simplifies prompt engineering and downstream processing.
- Accurate stats → better monitoring of usage quotas and performance metrics.
That’s the scoop on this RC—stay tuned for the next stable release! 🚀

🔗 View Release
February 5, 2026
ComfyUI – v0.12.3
ComfyUI – v0.12.3

ComfyUI v0.12.3 🚀 – fresh updates for your node‑based diffusion playground

What’s new?
- Node Library Refresh
• ~30 brand‑new nodes (image upscalers, custom samplers, advanced ControlNet adapters).

• Existing nodes get tighter type hints and smarter error handling.
- Performance Boosts
• GPU memory optimizer now reuses intermediate tensors → ~15 % VRAM savings on typical pipelines.

• Graph compilation is faster; start‑up drops about 0.8 s on an RTX 3080.
- Workflow Enhancements
• Batch Mode toggle: queue multiple prompts, auto‑stitch results.

• New “Snapshot Export” button saves the whole graph + parameters to a single `.cui` file for easy sharing.
- UI Tweaks
• Resizable side panels and fresh dark‑mode accent colors.

• Inline tooltip previews on node ports—no more guessing connections.
- Bug Fixes & Stability
• No more crashes loading huge checkpoints (>8 GB).

• Fixed async preview renderer race conditions on macOS.
- Platform Support
• Official Linux‑ARM wheels added – run natively on Apple Silicon (M1/M2) and Raspberry Pi 4.

💡 Quick tip: Turn on Batch Mode from the top toolbar, drop a list of prompts into the new “Prompt Queue” node, and let ComfyUI churn out variations automatically—perfect for rapid prompt engineering!

🔗 View Release
February 5, 2026
Tater – Tater v52
Tater – Tater v52

Tater v52 – Smarter Messages, Smarter Schedules, Happier Feeds 🚀

What’s fresh:
- AI Tasks go first‑class – Schedule anything (“Every morning at 6 am, give me the weather”) and Tater will run the task, use tools if needed, then drop the result right where you asked.
- Auto‑targeting built in – Ask from Discord, IRC, Matrix or Home Assistant → replies & reminders automatically bounce back to that same channel.
- Queue‑based notifier system – More reliable, cleaner delivery across all platforms.
RSS upgrades:
- Managed inside Tater with per‑feed controls and destination settings.
- New posts start from the latest content only – no spam of old items.
Home Assistant & Matrix:
- HA now uses Tater’s notifications API; optional phone push when configured.
- Better handling of room aliases like `#room:server`.
WebUI refresh:
- New AI Tasks tab → view, edit or delete scheduled tasks in a click.
Polish touches:
- “Notify” tool renamed to Send Message for clarity.
- Image attachment support flows through the queued notifier on platforms that allow media.
All of this means less manual routing, smarter recurring jobs, and smoother cross‑platform automation. Happy automating! 🎉

🔗 View Release
February 5, 2026
Ollama – v0.15.5-rc3: qwen3next: fix issue in delta net (#14075)
Ollama – v0.15.5-rc3: qwen3next: fix issue in delta net (#14075)

Ollama v0.15.5‑rc3 – qwen3next: fix issue in delta net 🚀

A quick bug‑fix drop for anyone running Qwen 3 locally.

What’s new?
- Delta‑net axis fix: The `gDiffExp` tensor was being broadcast on the wrong dimension, causing gradient glitches during token prediction. The patch reshapes it to `[1, chunkSize, nChunks, …]`, restoring correct inference flow and eliminating spurious errors.
- Verified commit: `25579a` signed with GPG key B5690EEE… for reproducibility.
That’s the whole change—just a targeted stability tweak, but it means smoother Qwen 3 runs on macOS, Windows, or Linux. 🎯

Tip: After updating, run a quick test generation to confirm the error is gone before integrating into larger pipelines. Happy tinkering!

🔗 View Release
February 4, 2026
MLX-LM – v0.30.6
MLX-LM – v0.30.6

MLX‑LM v0.30.6 just dropped – fresh on Apple silicon! 🍏✨

What it does:

Generate text and fine‑tune massive LLMs right on your M‑series Mac using the MLX framework. Plug into Hugging Face, run quantized models, handle long prompts, and scale with distributed inference.

What’s new in this release:
- LongCat Flash parser & Lite – lightning‑fast token streaming (shoutout @kernelpool).
- Kimi‑K2.5 support – tool‑call handling fixed; Kimi models work out‑of‑the‑box.
- MLX bump – upgraded backend for smoother, faster Apple silicon performance.
- Nemotron H config fix – aligns with HuggingFace format → hassle‑free loading.
- MultiLinear quant bug – restored missing `mode` argument; no more crashes during quantization.
- CLI finally live – real command‑line interface (thanks @awniin) plus quick bug fixes.
- Distributed inference – server can now spread work across multiple nodes (big thanks @angeloskath).
- Custom model loading – drop any 🤖 model into the folder; the server auto‑detects it.
- BatchRotatingKVCache default – smarter cache handling in batch mode for faster generation.
- Step 3.5 Flash & conversion fix – new flash‑optimized step and corrected model conversion pipeline.
- Chat template kwargs + top_logprobs – richer chat templates supported; can return token‑level probabilities.
- Stability upgrades: GLM 4.7 fallback handling, Deepseek V3.2 tweaks, batch mamba & sliding‑window mask fixes.
🚀 New contributor alert: @jalehman landed the first PR—welcome aboard!

More speed, more flexibility, fewer crashes. Happy tinkering! 🎉

🔗 View Release
February 4, 2026
Ollama – v0.15.5-rc2

Ollama – v0.15.5-rc2

_New update detected._

🔗 View Release

February 4, 2026