arXiv cs.CL·25 May 2026

Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals

Signal

Hype

In three linesMaR (Metacognition-as-Reward) is an RL framework improving LLM reasoning via two dimensions: metacognitive knowledge (identifying task-relevant information) and metacognitive regulation (planning the reasoning process). Tested on 22 benchmarks, Qwen3.5-9B + MaR achieves up to 7.7% gain over base model and 11.0% over vanilla DAPO, surpassing GPT-OSS-120B on average.

Read source

Your take?

Reinforcement learning Reasoning Qwen Benchmarks

Summary generated by Claude — human-verified

Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals

Other angles on this story