Back to feed
arXiv cs.CL·

Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals

Signal
78
Hype
25
In three linesMaR (Metacognition-as-Reward) is an RL framework improving LLM reasoning via two dimensions: metacognitive knowledge (identifying task-relevant information) and metacognitive regulation (planning the reasoning process). Tested on 22 benchmarks, Qwen3.5-9B + MaR achieves up to 7.7% gain over base model and 11.0% over vanilla DAPO, surpassing GPT-OSS-120B on average.
Read source
Your take?
Reinforcement learningReasoningQwenBenchmarks

Summary generated by Claude — human-verified