arXiv cs.AI·19 May 2026

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Signal

Hype

In three linesAMR-SD introduces asymmetric meta-reflective self-distillation to improve token-level credit assignment in LLM reinforcement learning. The method compresses diagnostic signals into self-generated Socratic hints and uses Causal Information Gain with asymmetric ReLU-gated threshold for sparse token-level advantage modulation, preventing late-stage training collapse.

Read source

Your take?

Reinforcement learning Reasoning Alignment

Summary generated by Claude — human-verified

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Other angles on this story