Ollama – v0.22.1-rc0: New models (#15861)
Ollama just dropped a fresh release candidate (v0.22.1-rc0), and itโs packed with some heavy-hitting model updates and precision improvements! If you’re running local LLMs, this one is definitely worth a look for better quantization and smarter logging. ๐ ๏ธ
Hereโs the lowdown on whatโs new:
- New Model Support: The team has added support for the Laguna models (via both `mlx` and `ggml`) and implemented support for Nemotron 3 Nano Omni.
- FP8 Precision Upgrades: A big win for efficiency! Ollama can now import FP8 safetensors. It intelligently handles decoding HF F8_E4M3 weights and uses source-precision metadata to decide the best quantization path (like defaulting FP8-sourced GGUFs to Q8_0). This means better quality when compressing models.
- Improved Logprobs: The server now preserves `logprobs` during generation, even when using built-in parsers. Previously, logprob-only chunks could get dropped if the parser was buffering content; now, that data stays intact for much more accurate probability tracking. ๐
- Poolside Integration: Added integration and updated documentation for Poolside, expanding your local ecosystem options.
- Performance & Fixes: Includes various performance improvements for review comments, updates to the cache setup, and several bug fixes to keep things running smoothly.
Time to pull that new image and test out those FP8 weights! ๐
