Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else

I'm running llama.cpp using this docker container: (it's just a lot easier than building from source, which I was doing previously). The MI60 (or MI50) are just a real pain in the behind to get working with Ubuntu 24.04. That container has it up in minutes, real timesaver. Anyway, my personal use case for LLM's is primarily for Frigate to review camera footage and cut down on "notification noise" (it's like having a human review footage to determine what I need to know about and what I don't). The other use is for HomeAssistant.