llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

Now you can download VRAM (by downloading new llama.cpp version) submitted by /u/jacek2023 [link] [comments]