Back to feed
arXiv cs.AI·

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Signal
82
Hype
18
In three linesGAMMA is a quantizer-agnostic mixed-precision framework that automatically allocates bit precision per module without quantization-aware training. Using teacher-forced hidden-state reconstruction and integer programming, it achieves +12.99 Avg. over fixed baselines on Llama/Qwen 8B-32B, matching 3-bit quality at 2.5-bit average.
Read source
Your take?
LlamaQwenBenchmarks

Summary generated by Claude — human-verified