Back to feed
arXiv cs.LG·

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Signal
78
Hype
15
In three linesHELLoRA attaches LoRA modules only to the most frequently activated experts per layer in Mixture-of-Experts models, reducing trainable parameters by 84% on OlMoE and improving accuracy by 9.2%. Tested on OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE across mathematical reasoning, code generation, and safety alignment.
Read source
Your take?
Fine-tuningBenchmarks

Summary generated by Claude — human-verified