arXiv cs.CL·29 May 2026

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Signal

Hype

In three linesAryabhata 2 is a STEM reasoning language model trained via reinforcement learning on GPT-OSS-20B. Developed by PhysicsWallah, it outperforms its base model on JEE/NEET competitive exams while reducing output tokens by up to 64%. Evaluated on AIME, HMMT, MMLU-Pro, and GPQA.

Read source

Your take?

Reinforcement learning Reasoning Benchmarks Code generation

Summary generated by Claude — human-verified

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Other angles on this story