Back to feed
arXiv cs.AI·

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Signal
75
Hype
15
In three linesAdaptive Layerwise Perturbation (ALP) addresses off-policy issues in LLM RL by injecting learnable perturbations into hidden states across all layers. This reduces heavy-tailed importance ratios, stabilizes training, and improves performance on math and multi-turn reasoning tasks while boosting exploration.
Read source
Your take?
Reinforcement learningReasoningPapers

Summary generated by Claude — human-verified