Back to feed
arXiv cs.AI·

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

Signal
78
Hype
25
In three linesCodeScaler is a reward model for training and inference scaling of code LLMs. Trained on verified preference data, it outperforms execution-based RL by +1.55 points on Qwen3-8B and +4.23 on Qwen3-14B. At inference, it reduces latency 10× while maintaining performance comparable to unit test approaches.
Read source
Your take?
Code generationReinforcement learningQwenBenchmarksEvals

Summary generated by Claude — human-verified