GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
Signal
78
Hype
15
In three linesGEMQ introduces global expert-level mixed-precision quantization for MoE-LLMs. The method uses global linear-programming formulation to estimate expert importance and fine-tunes routers to adapt routing to quantized experts. Results: significant memory reduction and inference acceleration with minimal accuracy loss.Read source
Your take?
Summary generated by Claude — human-verified