Back to feed
arXiv cs.CL·

Mixture of Experts for Low-Resource LLMs

Signal
78
Hype
15
In three linesAnalysis of routing dynamics in two MoE architectures (Qwen3-30B-A3B and Nemotron-3-Nano-30B-A3B) reveals deep-layer routing collapse for underrepresented languages (Hebrew, Japanese). Continual pre-training on balanced bilingual data corrects this imbalance better than supervised fine-tuning alone.
Read source
Your take?
BenchmarksFine-tuning

Summary generated by Claude — human-verified