Reddit r/LocalLLaMA·23 May 2026

Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else

Signal

Hype

In three linesUser ran 30 llama.cpp benchmarks on MI60 32GB GPU to optimize Gemma 4 26B Q4_1 and Qwen3 35B Q4_0 for Frigate and HomeAssistant. Results: voice commands <1.2s, video summaries <18s. Systematic testing across KV cache depths (0, 1000, 6000 tokens) with 512-token prompt and 128-token generation.

Read source

Your take?

Llama Benchmarks Code generation Open source

Summary generated by Claude — human-verified

Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else

Other angles on this story