arXiv cs.LG·25 May 2026

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

Signal

Hype

In three linesGEMQ introduces global expert-level mixed-precision quantization for MoE-LLMs. The method uses global linear-programming formulation to estimate expert importance and fine-tunes routers to adapt routing to quantized experts. Results: significant memory reduction and inference acceleration with minimal accuracy loss.

Read source

Your take?

Summary generated by Claude — human-verified