• Ollama – v0.30.0-rc3

    Ollama – v0.30.0-rc3

    Ollama just dropped v0.30.0-rc3, and it looks like the team is hard at work smoothing out the edges for Windows users! 🛠️

    If you haven’t tried Ollama yet, it’s the ultimate framework for running powerful LLMs like Llama 3, DeepSeek-R1, and Mistral locally on your own machine. It’s a total game-changer for privacy-focused devs and anyone wanting to experiment with AI without worrying about API costs or limits.

    What’s new in this release candidate:

    • Windows ROCm Fix: The big highlight here is a specific fix for the Windows ROCm build. This is huge news for anyone trying to leverage AMD GPUs on Windows to accelerate their local model inference! 🚀
    • CI Improvements: The update includes much-needed continuous integration (CI) fixes to ensure more stable, reliable builds moving forward.

    This is a targeted release focused on stability and hardware compatibility, making sure your local AI setup stays buttery smooth!

    🔗 View Release

  • Ollama – v0.30.0-rc1

    Ollama – v0.30.0-rc1

    Ollama v0.30.0-rc1 🦬

    If you haven’t jumped on the Ollama train yet, now is the time! It’s the ultimate go-to tool for running powerful large language models like Llama 3, DeepSeek-R1, and Mistral locally on your machine with zero friction. It handles all the heavy lifting of model management and serving so you can focus on building cool stuff.

    This latest release candidate (rc1) is a focused stability update:

    • Windows MLX Build Fix: The team has pushed a fix specifically for the Windows MLX build process. If you’ve been experimenting with MLX-related workflows on Windows, this should smooth out those compilation hiccups! 🛠️

    It looks like a targeted patch to keep your local LLM engine running buttery smooth across different environments. Keep an eye out for more feature-heavy lifting in the upcoming full release!

    🔗 View Release

  • Ollama – v0.30.0-rc0

    Ollama – v0.30.0-rc0

    Ollama v0.30.0-rc0 is here! 🚀

    If you’ve been looking for a way to run heavy-hitting models like Llama 3, DeepSeek-R1, or Mistral locally without the headache of complex configurations, Ollama is your best friend. It handles all the heavy lifting of downloading and setting up LLMs right on your machine.

    This latest release candidate brings some exciting refinements to the ecosystem:

    • Enhanced Model Management: Improvements to how models are pulled and managed via the CLI, making your local library more stable. 🧠
    • Performance Optimizations: Under-the-hood tweaks aimed at smoother inference speeds when running quantized models on macOS, Windows, and Linux. ⚡
    • API Reliability: Refinements to the REST API to ensure smoother integration when you’re building your own AI-powered apps or agents. 🛠️

    Keep an eye on this release candidate as it paves the way for even more robust local LLM deployment! Happy tinkering! 🥔✨

    🔗 View Release

  • Ollama – v0.23.1: mlx: Gemma4 MTP speculative decoding (#15980)

    Ollama – v0.23.1: mlx: Gemma4 MTP speculative decoding (#15980)

    Ollama v0.23.1 is officially live, and it’s bringing some serious speed boosts for Apple Silicon fans! 🚀 If you’ve been looking to squeeze more tokens per second out of your local LLMs, this update is a massive win for performance.

    The star of the show is support for MTP (Multi-Token Prediction) speculative decoding specifically for the Gemma 4 model family using MLX. This means much faster inference speeds on Mac hardware!

    Here’s the breakdown of what’s new:

    • Gemma 4 Optimization: Full support for MTP speculative decoding is now active, significantly boosting generation speed.
    • New `DRAFT` Command: You can now use a new `DRAFT` instruction in your `Modelfile` to specify exactly which draft model to use for speculation.
    • Streamlined Model Creation: It’s now easier than ever to import `safetensors`-based Gemma 4 draft models directly via the `ollama create` command.
    • New Quantization Flag: The `ollama create` command now includes a `–quantize-draft` flag, making it simple to manage lightweight draft models.
    • Under-the-Hood Upgrades: Includes updated rotating cache support to handle MTP correctly and enhanced sampling support for better draft model token prediction.

    If you’re running on a Mac, definitely grab this update and start experimenting with those lightning-fast generations! 🛠️✨

    🔗 View Release

  • Ollama – v0.23.1-rc0

    Ollama – v0.23.1-rc0

    Ollama v0.23.1-rc0 🛠️

    If you’re running local LLMs, you know Ollama is the gold standard for getting models like Llama 3 and DeepSeek-R1 up and running with zero friction. This latest release candidate is a targeted stability update to keep your local environment running smoothly!

    What’s new:

    • CI Pipeline Fixes: The main focus of this release is addressing issues within the Continuous Integration (CI) pipeline, specifically regarding MLXAssets2.
    • Improved Mac Reliability: This patch ensures that the build process for Apple Silicon (MLX) assets remains stable. If you’re running optimized models on Mac hardware, this keeps those gears turning without a hitch! ⚙️

    It’s a small but important maintenance patch to ensure the ecosystem stays robust and reliable for all us local-first tinkerers.

    🔗 View Release

  • Heretic – v1.3.0

    Heretic – v1.3.0

    Heretic v1.3.0 is live! 🛠️

    If you’ve been looking for a way to strip “safety alignment” from your favorite LLMs without the headache of manual fine-tuning, this is the tool you need. Heretic uses directional ablation (abliteration) to identify and neutralize refusal mechanisms by analyzing residual activations. The result? A decensored model that keeps its original intelligence intact without needing a PhD or massive labeled datasets.

    What’s new in v1.3.0:

    Expanded Model Support & Features

    • New Models: You can now run ablation on the latest Qwen 3.5 and Gemma 4 models! 🤖
    • Integrated Benchmarking: A brand-new system is now built-in to help you measure refusal rates and model fidelity directly.
    • Auto Model Cards: If your local models have an existing README, Heretic can now automatically generate model cards for you.
    • Smarter Responses: Improved automatic response prefix determination via a new, fully configurable two-step process.

    Performance & Optimization

    • VRAM Efficiency: Significant reductions in peak VRAM usage and fixed reporting accuracy for multi-GPU setups—perfect for squeezing more out of your hardware! 🧠
    • Reproducibility: Much more robust reproducible runs, making it a breeze to debug or compare different ablation results.
    • Faster Startup: Improved startup speed when using the `–help` flag.

    Bug Fixes & Infrastructure

    • Fixed a division-by-zero error in the evaluator.
    • Resolved issues with displaying all abliterable components across layers.
    • Corrected `max_memory` setting examples and various minor infrastructure improvements.

    Whether you’re running an 8B model on an RTX 3090 (which takes about 45 minutes!) or experimenting with massive MoE architectures, this update makes the workflow smoother and more precise than ever. Happy tinkering! 🚀

    🔗 View Release

  • ComfyUI – v0.20.2

    ComfyUI – v0.20.2

    ComfyUI v0.20.2 is officially live! 🚀

    If you’re a fan of node-based wizardry, you know ComfyUI is the ultimate playground for building complex Stable Diffusion pipelines without touching a line of code. It’s incredibly modular, making it a go-to for anyone wanting to orchestrate everything from SDXL generation to intricate ControlNet workflows.

    This latest minor update brings some sweet new compatibility to your node graphs:

    • OneTainer ERNIE LoRA Support: The big news here is the integration of OneTainer ERNIE LoRAs! This makes it much smoother to plug these specific fine-tuned models directly into your existing workflows. 🛠️

    Whether you’re upscaling, inpainting, or experimenting with new LCM models, this update keeps your toolkit expanding. Happy tinkering!

    🔗 View Release

  • Ollama – v0.23.0

    Ollama – v0.23.0

    Ollama v0.23.0 is officially live! 🚀

    If you aren’t running Ollama yet, you are missing out on the gold standard for local LLM orchestration. It’s the ultimate toolkit for pulling and running heavy hitters like Llama 3, DeepSeek-R1, and Mistral directly on your hardware—no cloud subscriptions or API keys needed.

    The team is moving at lightning speed, and this latest update brings some great refinements to your local workflow:

    • Claude-style Integration: This release introduces significant backend work to support Claude-style application structures, making it even easier to integrate sophisticated prompting patterns into your local setups.
    • Enhanced Stability: A major focus of this version is refining the launch processes for new model types, ensuring that when you pull a fresh architecture, it runs smoothly without a hitch.

    Whether you’re building a private RAG pipeline or just experimenting with the latest open-source weights, this update keeps your local inference engine rock solid. 🛠️

    🔗 View Release

  • Ollama – v0.23.0-rc0

    Ollama – v0.23.0-rc0

    Ollama just dropped a fresh release candidate, v0.23.0-rc0, and it’s looking like a major milestone for anyone running local LLMs! 🚀

    If you aren’t using Ollama yet, it is the ultimate framework for getting models like Llama 3, DeepSeek-R1, and Mistral up and running on your own hardware without needing a massive cloud budget. It handles all the heavy lifting of downloading and configuring models so you can focus on building.

    What’s new in this release:

    • Claude App Integration: This update includes significant work regarding the launch of Claude app support! The team is clearly focused on expanding how different model architectures and interfaces interact within the Ollama ecosystem. 🤖
    • Release Candidate Status: Since this is an `rc0` build, it’s the perfect playground for us tinkerers to test out the new plumbing and catch any bugs before the stable version hits the mainstream.

    This is a great time to pull the latest build and see how these architectural updates affect your local workflows! 🛠️

    🔗 View Release

  • Ollama – v0.22.1

    Ollama – v0.22.1

    Ollama just dropped v0.22.1, and it’s a quick but tasty update for anyone running local LLMs! 🥔

    If you haven’t tried Ollama yet, it is the ultimate toolkit for running powerful models like Llama 3, DeepSeek-R1, and Mistral directly on your own hardware without needing a cloud subscription. It handles all the heavy lifting of downloading and configuring models so you can focus on building.

    Here is what’s new in this release:

    • Gemma 4 Support: The star of this update is an updated renderer specifically optimized for Gemma 4. This ensures that when you’re pulling Google’s latest lightweight powerhouse, the architecture and output are handled perfectly by the Ollama backend.

    If you’ve been waiting to experiment with the newest Gemma weights, now is the time to pull that update and get tinkering! 🛠️

    🔗 View Release