Back to feed
arXiv cs.LG·

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

Signal
78
Hype
25
In three linesReCrit is a reinforcement learning framework improving LLM handling of user criticism in scientific reasoning. It decomposes behaviors into four quadrants (Correction, Sycophancy, Robustness, Boundary) using transition-aware rewards. On ChemBench, TRQA, and EarthSE, ReCrit improves accuracy from 38.15% to 51.49% on Qwen3.5-4B.
Read source
Your take?
Reinforcement learningReasoningQwenBenchmarks

Summary generated by Claude — human-verified