Back to feed
arXiv cs.CL·

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Signal
75
Hype
25
In three linesNew IH-GRPO algorithm decouples tool invocation from execution to enhance LLM mathematical reasoning. Achieves 1.87–2.53% improvements on mathematical benchmarks with Qwen3 (1.7B–8B). Code released.
Read source
Your take?
ReasoningAI AgentsReinforcement learningBenchmarksQwen

Summary generated by Claude — human-verified