arXiv cs.CL·19 May 2026

The Unlearnability Phenomenon in RLVR for Language Models

Signal

Hype

In three linesStudy reveals an 'unlearnability' phenomenon in Reinforcement Learning with Verifiable Reward (RLVR) for LLMs. Some hard examples remain unlearnable even with correct rollouts. Cross-example gradient analysis shows fundamental representation flaws: low gradient similarity and ungeneralizable reasoning patterns. Data augmentation fails to improve gradient similarity.

Read source

Your take?

Reinforcement learning Reasoning Papers

Summary generated by Claude — human-verified

The Unlearnability Phenomenon in RLVR for Language Models

Other angles on this story