Storing an index to a scale instead of the scale itself with Q4_0 quant reduces scale size by ~31% (small gain but interesting)
Signal
45
Hype
25
In three linesA researcher proposes reducing Q4_0 scale size for Qwen 3.6 27B by replacing scale values (16-bit) with indices (11-bit) pointing to a dictionary. Estimated gain: minimum 318 MB on full model (~31% scale reduction), requiring custom inference code.Read source
Your take?
Summary generated by Claude — human-verified