Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions
Signal
72
Hype
15
In three linesComparative study of three LLM approaches on 1,000 math problems (GSM-Symbolic): chain-of-thought (CoT), Program-Aided Language models (PAL), and Step-by-Step Coding (SBSC). CoT proves more robust to variations (1.3pp drop vs 1.7pp for PAL), contradicting the hypothesis that code execution improves reasoning robustness.Read source
Your take?
Summary generated by Claude — human-verified