Page 9 of 137

AllHigh signalRecent
5469 articles
arXiv cs.CL·

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

GenesisFunc is an automated multi-agent pipeline for generating function-calling training data. Starting from reliable tools in public benchmarks, the system produces diverse conversations with multi-stage quality control. An 8B model fine-tuned on this synthetic data outperforms similarly-sized open-source models in in-domain performance and out-of-domain generalization.

Multi-agentCode generationFine-tuning
SIG
78
HYP
25
arXiv cs.AI·

SkillGrad: Optimizing Agent Skills Like Gradient Descent

SkillGrad optimizes LLM agent skills using a gradient-descent-inspired framework. Task executions provide trajectory-level loss signals, automatic diagnostics generate text-based gradients, and a momentum agent accumulates recurring patterns. Evaluated on SpreadsheetBench and WikiTableQuestions, SkillGrad outperforms training-based baselines by 6.7 percentage points on average.

AI AgentsReinforcement learningPrompt engineering
SIG
78
HYP
25
arXiv cs.CL·

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

Study on gender preservation in English-to-Hindi translation. Benchmark of 37,345 instances shows GPT-4o-mini and Sarvam frequently erase gender via ergative constructions. Two rerankers (SAR and PAR) improve gender recoverability: PAR increases accuracy from 11-16% to 49-54%, but reduces fluency (4.36→3.37). Reveals preservation-fluency tradeoff.

BenchmarksVisionAlignment
SIG
78
HYP
15
arXiv cs.CL·

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Modality-Aware Policy Optimization (MAPO) addresses late-stage modality collapse in audio-text models during RL fine-tuning. The method concentrates policy gradients on modality-critical tokens via a modality relevance mask and adds an attention penalty to sustain cross-modal grounding. MAPO achieves SOTA on several complex audio reasoning benchmarks.

Reinforcement learningReasoningAlignment
SIG
78
HYP
25
arXiv cs.AI·

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

PEAM is an embodied agent memory framework in Minecraft that internalizes experience as parameters rather than inference-time retrieval. It pairs a slow LLM for reasoning with a fast parametric module (Mixture-of-Experts LoRA) learning via behavioral cloning and contrastive objectives. Failures are treated as training signals to learn corrected actions.

AI AgentsReinforcement learningFine-tuning
SIG
78
HYP
25
arXiv cs.CL·

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

TRACES is a proactive safety auditor for multi-turn LLM agents that detects drift toward unsafe behavior from hidden representations of an observer LLM. Trained with weak trajectory-level supervision, it produces dense prefix-level risk estimates, improving full-trajectory safety prediction and proactive risk discrimination across multiple agent safety benchmarks.

AI AgentsAI safetyReasoning
SIG
78
HYP
22
arXiv cs.AI·

A Policy-Driven Runtime Layer for Agentic LLM Serving

Proposes intermediate runtime layer between agent framework and LLM serving engine. Introduces four primitives (observe, score, predict, act) to implement agent-aware policies (KV caching, batch shaping, speculation, fairness, safety). CacheSage, instantiated for cross-session caching, achieves +13 to +37 pp cache hit-rate lift, 12–29% lower TTFT, 6–14% higher throughput on five real multi-agent workloads.

AI AgentsMulti-agentInfrastructure
SIG
78
HYP
25