Post-Trained MoE Can Skip Half Experts via Self-Distillation
Signal
78
Hype
15
In three linesZEDA, a self-distillation framework, converts post-trained static MoE models into dynamic variants. On Qwen3-30B-A3B and GLM-4.7-Flash, it reduces 50% of expert FLOPs with marginal accuracy loss and achieves 1.20× end-to-end inference speedup.Read source
Your take?
Summary generated by Claude — human-verified