Jetson AGX Orin 64GB: q8_0 good, q6_k bad
r/LocalLLaMA
•
Generative AI
Open Source AI
Just a quick observation for all three users of Jetson AGX Orin 64GB in this sub: q8_0 quant gives >20% faster prefill (prompt processing) than q6_k, and 10% faster than q4_k_xl. Tested with Unsloth Qwen3.6-27B-MTP-GGUF on recent llama.cpp build. I don't have statistics at hand, but from observation with prompt size of 10,000+ token: - q8_0: 245 pp - q6_k: 190 pp - q4_k_xl: 210 pp From monitoring `tegrastats` I see that EMC is never saturated, but climbs from some 40% to 60% when switching from q6_k to q8_0: hence, the device is NOT memory-bandwidth-bound.