Back to feed
Reddit r/LocalLLaMA·

In Q8_0 weight quantization, why can't we just skip blocks of 32 that have very large outliers?

Signal
35
Hype
15
In three linesTechnical discussion on Q8_0 quantization: why not skip blocks of 32 values containing outliers instead of quantizing them? Author suggests this approach could improve accuracy with less than 1% of sub-layers remaining unquantized.
Read source
Your take?
Open source

Summary generated by Claude — human-verified