Back to feed
arXiv cs.LG·

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

Signal
78
Hype
15
In three linesGEMQ introduces global expert-level mixed-precision quantization for MoE-LLMs. The method uses global linear-programming formulation to estimate expert importance and fine-tunes routers to adapt routing to quantized experts. Results: significant memory reduction and inference acceleration with minimal accuracy loss.
Read source
Your take?

Summary generated by Claude — human-verified