arXiv cs.AI·19 May 2026

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Signal

Hype

In three linesGAMMA is a quantizer-agnostic mixed-precision framework that automatically allocates bit precision per module without quantization-aware training. Using teacher-forced hidden-state reconstruction and integer programming, it achieves +12.99 Avg. over fixed baselines on Llama/Qwen 8B-32B, matching 3-bit quality at 2.5-bit average.

Read source

Your take?

Llama Qwen Benchmarks

Summary generated by Claude — human-verified

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Other angles on this story