Back to feed
arXiv cs.AI·

MAVEN: Improving Generalization in Agentic Tool Calling

Signal
75
Hype
25
In three linesMAVEN is a lightweight symbolic reasoning scaffold to improve generalization of LLM agents in tool-calling tasks. Evaluated on BFCL v3, TauBench, Tau2Bench, AceBench and a new MAVEN-Bench benchmark, it increases GPT-OSS-120b accuracy from 48% to 71% without additional training, at roughly 1/10 the cost of proprietary baselines.
Read source
Your take?
AI AgentsReasoningBenchmarksTools

Summary generated by Claude — human-verified