ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate
Signal
75
Hype
15
In three linesARCA introduces a token-level credit assignment method for LLM reinforcement learning that addresses degeneracy of intrinsic signals (surprisal, entropy reduction, policy divergence) under LoRA. It measures adapter salience directly via L2 norm of hidden-state residuals instead of output-distribution shifts. Tested on MATH/Qwen3-1.7B with GRPO, ARCA avoids pathological weight concentration.Read source
Your take?
Summary generated by Claude — human-verified