Ollama – v0.20.1-rc1: ggml: fix ROCm build for cublasGemmBatchedEx reserve wrapper

Ollama – v0.20.1-rc1: ggml: fix ROCm build for cublasGemmBatchedEx reserve wrapper

Ollama v0.20.1-rc1 is officially live, bringing some much-needed stability for the AMD crowd! πŸš€

If you’ve been trying to leverage your AMD GPU to run local LLMs like Llama 3 or DeepSeek-R1, this release is a critical one. It focuses heavily on refining the ROCm build, ensuring that hardware acceleration is smoother and more reliable for those of us not using NVIDIA.

What’s new in this release:

  • Fixed ROCm Build: Resolved specific issues within the `ggml` library to prevent crashes and improve stability when running on AMD GPUs.
  • Improved Type Mapping: Added missing mappings between `cublasGemmAlgo_t` and `hipblasGemmAlgo_t`, which helps with smoother communication between software layers.
  • Wrapper Optimization: Fixed a bug in the `cublasGemmBatchedEx` reserve wrapper by correcting how const qualifiers are handled, ensuring compatibility with `hipblasGemmBatchedEx`.

This is a great update for anyone building a local AI workstation around AMD hardware. Grab the update and get those models running! πŸ› οΈ

πŸ”— View Release