Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else
Signal
72
Hype
15
In three linesUser ran 30 llama.cpp benchmarks on MI60 32GB GPU to optimize Gemma 4 26B Q4_1 and Qwen3 35B Q4_0 for Frigate and HomeAssistant. Results: voice commands <1.2s, video summaries <18s. Systematic testing across KV cache depths (0, 1000, 6000 tokens) with 512-token prompt and 128-token generation.Read source
Your take?
Summary generated by Claude — human-verified