Back to feed
Reddit r/LocalLLaMA·

Jetson AGX Orin 64GB: q8_0 good, q6_k bad

Signal
45
Hype
15
In three linesOn Jetson AGX Orin 64GB, q8_0 quantization delivers 20% faster prefill than q6_k and 10% faster than q4_k_xl. Tested with Qwen 3.6-27B-MTP-GGUF on recent llama.cpp: q8_0 reaches 245 tokens/s vs 190 for q6_k. EMC not saturated, suggesting CUDA optimization issue rather than memory bandwidth constraint.
Read source
Your take?
QwenBenchmarksInfrastructure

Summary generated by Claude — human-verified