Back to feed
arXiv cs.AI·

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Signal
78
Hype
15
In three linesZEDA, a self-distillation framework, converts post-trained static MoE models into dynamic variants. On Qwen3-30B-A3B and GLM-4.7-Flash, it reduces 50% of expert FLOPs with marginal accuracy loss and achieves 1.20× end-to-end inference speedup.
Read source
Your take?
QwenFine-tuningReasoningBenchmarks

Summary generated by Claude — human-verified