Jetson AGX Orin 64GB: q8_0 good, q6_k bad
Signal
45
Hype
15
In three linesOn Jetson AGX Orin 64GB, q8_0 quantization delivers 20% faster prefill than q6_k and 10% faster than q4_k_xl. Tested with Qwen 3.6-27B-MTP-GGUF on recent llama.cpp: q8_0 reaches 245 tokens/s vs 190 for q6_k. EMC not saturated, suggesting CUDA optimization issue rather than memory bandwidth constraint.Read source
Your take?
Summary generated by Claude — human-verified