Back to feed
arXiv cs.CL·

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Signal
75
Hype
25
In three linesAryabhata 2 is a STEM reasoning language model trained via reinforcement learning on GPT-OSS-20B. Developed by PhysicsWallah, it outperforms its base model on JEE/NEET competitive exams while reducing output tokens by up to 64%. Evaluated on AIME, HMMT, MMLU-Pro, and GPQA.
Read source
Your take?
Reinforcement learningReasoningBenchmarksCode generation

Summary generated by Claude — human-verified