Back to feed
arXiv cs.LG·

Theory-optimal Quantization Based on Flatness

Signal
78
Hype
15
In three linesNew post-training quantization method for LLMs called Bidirectional Diagonal Quantization (BDQ). Introduces Flatness metric to quantify activation outlier distribution. BDQ achieves <1% accuracy drop in W4A4 on LLaMA-3-8B and reduces performance gap by 39.1% in W2A4KV16 on DeepSeek-R1-Distill-LLaMA-70B.
Read source
Your take?
LlamaDeepSeekBenchmarks

Summary generated by Claude — human-verified