Back to feed
Reddit r/LocalLLaMA·

Quick note on the QAT of recent

Signal
35
Hype
25
In three linesTechnical critique on recent quantization: Google's implementation is flawed (token embed quantized to q6k instead of --pure), llama-quantize hardcodes -7 incorrectly, and 32 block groups are misaligned. Unsloth Q4_K_XL performs better (pure q4_0). A patch is being developed.
Read source
Your take?
LlamaOpen sourceTools

Summary generated by Claude — human-verified