Back to feed
arXiv cs.CL·

Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Signal
75
Hype
15
In three linesSurvey of ~120 studies on mathematical reasoning in LLMs. Structured analysis of datasets, architectures, training strategies, and evaluation protocols. Identifies recurring failure modes: reasoning faithfulness, benchmark biases, generalization limitations.
Read source
Your take?
ReasoningBenchmarksEvalsFine-tuning

Summary generated by Claude — human-verified