arXiv cs.LG·20 May 2026

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Signal

Hype

In three linesHELLoRA attaches LoRA modules only to the most frequently activated experts per layer in Mixture-of-Experts models, reducing trainable parameters by 84% on OlMoE and improving accuracy by 9.2%. Tested on OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE across mathematical reasoning, code generation, and safety alignment.

Read source

Your take?

Fine-tuning Benchmarks

Summary generated by Claude — human-verified

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

Other angles on this story