Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244)

Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244)

Ollama – v0.20.8-rc0: Gemma4 on MLX (#15244) just dropped an update! πŸš€

If you’re running local LLMs, especially on Apple Silicon, this release is packed with optimizations to make your models run even smoother.

What’s new in this release:

  • Gemma 4 Support via MLX: You can now run the Gemma 4 model using the MLX framework (text-only runtime). This is a massive win for Mac users looking to leverage highly optimized performance on Apple hardware! 🍎
  • Enhanced Prefill Speed: The team implemented two clever fixes to accelerate the “prefill” stage (how the model processes your initial prompt) for Gemma 4’s specific architectures:
  • Mask Memoization: The sliding-window prefill mask is now memoized across layers, cutting out redundant calculations.
  • Efficient Softmax: The Router forward pass has been streamlined to perform Softmax only over the specifically selected experts, making the routing process much leaner and faster.

If you’re tinkering with local AI on a Mac, grab this update to get that extra bit of snappiness in your workflow! πŸ› οΈ

πŸ”— View Release