Back to feed
arXiv cs.CL·

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Signal
78
Hype
15
In three linesZEDA converts post-trained static MoE models into dynamic variants via self-distillation. On Qwen3-30B-A3B and GLM-4.7-Flash, the method eliminates 50% of expert FLOPs with marginal accuracy loss and achieves 1.20× end-to-end inference speedup.
Read source
Your take?
QwenFine-tuningInfrastructure

Summary generated by Claude — human-verified