Back to feed
arXiv cs.AI·

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Signal
72
Hype
18
In three linesAMR-SD introduces asymmetric meta-reflective self-distillation to improve token-level credit assignment in LLM reinforcement learning. The method compresses diagnostic signals into self-generated Socratic hints and uses Causal Information Gain with asymmetric ReLU-gated threshold for sparse token-level advantage modulation, preventing late-stage training collapse.
Read source
Your take?
Reinforcement learningReasoningAlignment

Summary generated by Claude — human-verified