llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp
r/LocalLLaMA
•
Generative AI
Open Source AI
Now you can download VRAM (by downloading new llama.cpp version) submitted by /u/jacek2023 [link] [comments]