TaterBytes – Page 18

Skip to content

Ollama – v0.17.8-rc2: mlx: perf improvements (#14768)

Ollama – v0.17.8-rc2: mlx: perf improvements (#14768)

🚀 Ollama v0.17.8-rc2 is here — and it’s bringing major mlx performance boosts for Apple Silicon users!

This release is all about speed and efficiency on M1/M2/M3 chips, thanks to smarter use of Apple’s MLX framework. Here’s what’s new:

🔹 Layer Norm Got a Power-Up

→ Ditched the 6-step manual layer norm (mean → subtract → variance → rsqrt → multiply → add)

→ Now uses `mlx_fast_layer_norm` — a native, optimized kernel. Way faster and cleaner!

🔹 GQA Just Got Smarter

→ Removed custom `RepeatKV` tiling logic for Grouped-Query Attention (GQA)

→ Now leverages `scaled_dot_product_attention`, which natively supports GQA — as long as `n_q_heads % n_kv_heads == 0`.

✅ Result?

⚡ Faster inference

🧠 Lower memory usage

✨ Cleaner, more maintainable code

Perfect for devs and tinkerers pushing their Macs to the limit! 🍏💻

Let us know if you’d like a deep dive into how GQA + native attention works under the hood! 🧠⚡

🔗 View Release

March 12, 2026
Deep-Live-Cam – 2.7 beta

Deep-Live-Cam – 2.7 beta

🚨 AI Enthusiasts—deepfake magic just got a serious upgrade! 🚨

🔥 DeepLiveCam v2.7 Beta is here—celebrating 80K GitHub stars with the biggest update yet! 🎉

✨ What’s Fresh:

✅ Realtime Face Enhancer — now running up to 27 FPS with enhancement on! 🚀

✅ Inswapper Optimizer — faster, smoother swaps with less lag

✅ 2 NEW enhancer models: GPEN 512 & GPEN 256 (for sharper, more natural results)

✅ Face Enhancer Scaler — dial in enhancement intensity like a pro 🎛️

✅ Masking upgrades: Quick Lip, Lip, Chin, Eyes masks — precision control at your fingertips 🎭

✅ Interpolation support for buttery-smooth output frames

✅ GPU Changer — full multi-GPU support (switch GPUs on the fly!) 🖥️➡️🖥️

✅ LUTs — apply color grading presets in seconds 🎨

✅ Window Projection — watch live output in a dedicated (even full-screen!) window 📺

✅ Camera Refresh & Resolution Changer — tweak input/output like a streamer

✅ Cleaner, smarter UI — because UX matters as much as power 💡

🎥 Check out the demo: `2026-03-12.02.34.57.mp4`

📚 Quickstart guide: [link in bio/pinned]

This isn’t just an update—it’s a deepfake renaissance. Go play! 🧪✨

🔗 View Release

March 11, 2026
Voxtral Wyoming – v2.1.0

Voxtral Wyoming – v2.1.0

🚨 Voxtral Wyoming v2.1.0 is live! 🚨

Offline STT just got smarter and more flexible—and you’re going to love these updates:

🔹 ✨ Word Replacement Feature

Swap words or phrases on the fly in your prompts—ideal for A/B testing, localization, brand tone adjustments, or just having fun with custom phrasing. Think of it as prompt surgery 🩺: precise, fast, and fully in your control.

🔹 🚀 Early Request Acceptance

Want a feature before it drops publicly? Submit your request early and lock in priority access—get ahead of the curve and shape what’s next.

💡 Bonus: Still supports CPU/CUDA/MPS, Dockerized deployment, and auto-converts MP3/OGG/FLAC/WAV → PCM16.

🔧 Configurable via env vars for host, port, language, model ID—and now word replacement rules too.

Ready to give it a spin? Drop your favorite use case below—we’re listening! 🎙️🤖

🔗 View Release

March 11, 2026
Lemonade – v10.0.0

Lemonade – v10.0.0

🚀 Lemonade v10.0.0 is live!

Big news for local LLM tinkerers — the latest Lemonade release upgrades its core runtime to FLM v0.9.35 🧠⚡ (see PR #1233). While full release notes are still loading, this major bump likely brings performance tweaks, bug fixes, or new acceleration pathways under the hood — especially for NPU/GPU inference on Ryzen AI and Vulkan-enabled hardware.

✅ Still supports GGUF & ONNX models

✅ OpenAI-compatible local API

✅ Python SDK + CLI for deep customization

✅ Windows & Linux love ❤️

If you’re running Lemonade locally, this is a great time to upgrade — especially if you’ve been waiting for smoother NPU offload or faster token generation. 🛠️

Curious what changed under the hood? Let’s dig into PR #1233 together! 🕵️‍♂️

🔗 View Release

March 11, 2026
Voxtral Wyoming – v2.0.0

Voxtral Wyoming – v2.0.0

🚨 Voxtral Wyoming v2.0.0 is live! 🚨

Hey AI tinkerers & home automation wizards—big news: Voxtral Wyoming just dropped v2.0.0 with full support for Mistral’s brand-new Gen2 models! 🎤⚡

🔹 Voxtral Realtime — optimized for low-latency, real-time STT with improved voice clarity and speed.

🔹 Voxtral-Mini-4B-Realtime-2602 — the ultra-lightweight powerhouse for edge devices (Raspberry Pi, Jetson, etc.) without sacrificing accuracy.

✨ Why you’ll love it:

✅ Up to 30% faster inference on supported hardware (CUDA/MPS/CPU)

✅ Smoother Home Assistant Assist integration

✅ All the usual perks: Docker-ready, auto audio conversion (MP3/OGG/FLAC/WAV → PCM16), and env-based config

Ready to bring offline, private voice control to your setup? 🛠️

👉 [Check the docs & deploy now!]

🔗 View Release

March 11, 2026
Ollama – v0.17.8-rc1: ci: Fix windows build (#14754)
Ollama – v0.17.8-rc1: ci: Fix windows build (#14754)

🚨 Ollama v0.17.8-rc1 is live! 🚨

This one’s a micro-update with big implications for Windows devs and CI workflows — no flashy new features, but a solid behind-the-scenes upgrade!

🔧 What’s new?
- ✅ Fixed Windows build issues (#14754) by swapping out shell-based wildcard expansion (`sh`) for a native Go implementation.
- 🌐 Why it matters: `sh` (e.g., bash) behaves differently—or not at all—on Windows. This change boosts cross-platform consistency, especially in automated environments like GitHub Actions or Docker builds.
- 🛠️ Think of it as Ollama finally nailing its Windows boot sequence — fewer “works on my machine” headaches!
📦 No user-facing changes yet, but this paves the way for smoother releases ahead. Keep an eye out for the stable `v0.17.8` drop! 🎯

#Ollama #LLMDev #AITools

🔗 View Release
March 10, 2026
Ollama – v0.17.8-rc0: MLX: add header vendoring and remove go build tag (#14642)
Ollama – v0.17.8-rc0: MLX: add header vendoring and remove go build tag (#14642)

🚀 Ollama v0.17.8-rc0 is here — and it’s packing some serious dev-friendly upgrades!

🔥 MLX Integration Gets a Major Boost
- ✅ Header vendoring for `mlx-c` — Go can now build without CMake first 🎉
- ✅ Removed `go build` tag restriction — MLX support is now on by default
- 🔄 Headers auto-refresh on CMake runs → easier upgrades & less friction
- ✅ Basic Windows + Linux support confirmed (finally!)
🔧 ROCm on Windows
- 🚫 Temporarily sticking with ROCm v6 (v7 needs more love — stay tuned)
🛠️ CI/CD Hardening
- 🛡️ CI now handles flaky Chocolatey repos gracefully
- ⚠️ Builds proceed even if cache fails (since caching is optional for speed)
💡 Why it matters: Smoother builds, better Apple Silicon (MLX) support, and more resilient CI = faster iteration for devs experimenting locally. 🧪💻

Check it out if you’re running Ollama on macOS/Windows/Linux — especially with Apple Silicon or ROCm setups! 🧠⚡

#Ollama #LLMs #AIDev #LocalLLM

🔗 View Release
March 10, 2026
Text Generation Webui – v4.0

Text Generation Webui – v4.0

🚀 Text-Generation-WebUI v4.0 is here! 🚀

The big v4.0 update brings major improvements across the board — think of it as the “big bang” moment for the UI’s modernization! Here’s what’s new:

🔹 Brand-new Gradio-based UI

A complete redesign with a sleek, responsive interface — faster, cleaner, and way more intuitive. Think Stable Diffusion WebUI vibes, but for LLMs 🎨✨

🔹 Backend Agnostic Architecture

Now even easier to switch between backends (llama.cpp, Transformers, ExLlamaV3/V2, TensorRT-LLM) — with better hot-swapping and config management. 🔄

🔹 Built-in Document Loader (PDF, DOCX, TXT)

Upload and chat with your files directly — no more copy-paste or external tools! 📄➡️🧠

🔹 Web Search Integration (Optional)

Add real-time context via DuckDuckGo or SERP API — perfect for RAG without the hassle. 🔍🌐

🔹 New Extensions API

Smoother, more powerful extension support — TTS, translation, custom UI hooks, and more. 🧩

🔹 OpenAI-Compatible API Improvements

Better parity with the real deal — including streaming, chat completions, and tool calling. 📡

🔹 One-Click Portable Builds & Installer

Still as easy as ever to get up and running — Windows, Linux, macOS. 🧪⚡

🔥 Bonus: Better memory management, improved prompt templates, and a much more stable chat mode.

Grab the update — your local LLM playground just got a serious upgrade! 🛠️💻

👉 github.com/oobabooga/text-generation-webui

🔗 View Release

March 7, 2026
Ollama – v0.17.7
Ollama – v0.17.7

🚨 Ollama v0.17.7 is out! 🚨

This patch brings a subtle but important fix under the hood:

🔹 Stale context window entries now get properly overridden — meaning outdated prompt/chat history data won’t linger and mess with your inference accuracy. 💡

🧠 Why you’ll care:
- Cleaner, more reliable multi-turn conversations
- Better token efficiency (no hidden bloat from old context!)
- Smoother long-context handling — especially helpful if you’re pushing model limits
📦 No flashy new models or API changes this time, but it’s a solid reliability bump for everyday use.

🔗 Full details: v0.17.7 Release

Happy local LLM tinkering! 🛠️🤖

🔗 View Release
March 6, 2026
Ollama – v0.17.7-rc2
Ollama – v0.17.7-rc2

🚀 Ollama v0.17.7-rc2 is out!

This release candidate brings a handy fix for context window management — specifically, overriding stale entries in the context tracking logic. 🧠✨

🔹 What’s fixed?
- Stale context data (e.g., outdated conversation history) no longer lingers and messes with model responses.
- Improves reliability in multi-turn chats, especially for longer sessions or when switching between conversations.
💡 Why it matters: Cleaner context = more accurate, consistent responses — and fewer “wait, why did it say that?!” moments. 😅

Since this is an rc2, it’s a pre-release focused on polish and stability ahead of the final `v0.17.7`. No flashy new features yet, but solid under-the-hood improvements!

👉 Grab it and test: v0.17.7-rc2 on GitHub

Let us know how it behaves in the wild! 🛠️

🔗 View Release
March 6, 2026

←Previous Page

1 … 16 17 18 19 20 … 44