Post-Trained MoE Can Skip Half Experts via Self-Distillation
Signal
78
Hype
15
In three linesZEDA converts post-trained static MoE models into dynamic variants via self-distillation. On Qwen3-30B-A3B and GLM-4.7-Flash, the method eliminates 50% of expert FLOPs with marginal accuracy loss and achieves 1.20× end-to-end inference speedup.Read source
Your take?
Summary generated by Claude — human-verified