Back to feed
arXiv cs.LG·

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

Signal
75
Hype
15
In three linesARCA introduces a token-level credit assignment method for LLM reinforcement learning that addresses degeneracy of intrinsic signals (surprisal, entropy reduction, policy divergence) under LoRA. It measures adapter salience directly via L2 norm of hidden-state residuals instead of output-distribution shifts. Tested on MATH/Qwen3-1.7B with GRPO, ARCA avoids pathological weight concentration.
Read source
Your take?
Reinforcement learningFine-tuningReasoningPapers

Summary generated by Claude — human-verified