Page 33 of 142

AllHigh signalRecent
5654 articles
arXiv cs.AI·

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

HINT-SD proposes targeted self-distillation for training long-horizon LLM agents. The method uses full-trajectory hindsight to identify failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. On BFCL v3 and AppWorld, it improves over dense per-turn feedback baselines by up to 18.80% while achieving 2.26× lower time per training step.

AI AgentsReinforcement learningReasoning
SIG
75
HYP
15
arXiv cs.AI·

Beyond Accuracy: Robustness, Interpretability and Expressiveness of EEG Foundation Models

Comparative study of 6 EEG foundation models across 8 datasets beyond clean accuracy. Robustness analysis (noise, channel dropout), interpretability via Attention-Aware Layer-Wise Relevance Propagation, and expressiveness through block-wise probing. Findings: no single model dominates all failure modes; models focus on task-appropriate brain regions but decode corrupted content poorly.

BenchmarksEvalsAI safety
SIG
75
HYP
15
arXiv cs.AI·

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

OCCAM is a framework for explaining black-box image classifier decisions through causal visual concepts. It discovers concepts in open-set manner, localizes them via text-guided segmentation, and measures causal contribution through object-level interventions. OCCAM aggregates interventional evidence to induce a structured ontology revealing concept dependencies and systematic model biases.

VisionEvalsReasoning
SIG
75
HYP
15
arXiv cs.AI·

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

QSTRBench is a benchmark evaluating LLMs' ability to reason with qualitative spatial and temporal reasoning (QSTR). It covers 9 calculi (Point Algebra, Allen's Interval Algebra, RCC-5/8/22, etc.) with composition tables, converse relations, and conceptual neighbourhoods. Tested models outperform guessing but none answer all questions correctly. RCC-22 proves most difficult.

BenchmarksReasoningEvals
SIG
75
HYP
15
arXiv cs.AI·

Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

ProRL is a programmatic reinforcement learning framework for combinatorial optimization (job shop scheduling). It generates interpretable policies as human-readable programs via a domain-specific language (DSL-S), exploring the program space through local search and Bayesian optimization. Outperforms classical heuristics and DRL baselines with minimal training episodes.

Reinforcement learningReasoningBenchmarks
SIG
75
HYP
15
arXiv cs.AI·

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

EvolveR is a framework enabling LLM agents to learn from their own experiences through a closed-loop lifecycle. It combines offline self-distillation (synthesizing interaction trajectories into reusable strategic principles) and online interaction (actively retrieving distilled principles to guide decisions). Tested on complex multi-hop QA benchmarks, it outperforms existing agentic baselines.

AI AgentsReinforcement learningReasoning
SIG
75
HYP
25
arXiv cs.AI·

FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation

FUNCanon breaks down long-horizon manipulation tasks into action sequences (actor-verb-object) and canonicalizes objects by functional affordances using VLM cues. FuncDiffuser, an object-centric and action-centric diffusion policy, learns on aligned data to generalize across object categories and enable cross-task behavior reuse.

RoboticsVisionAI Agents
SIG
75
HYP
25
arXiv cs.AI·

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity

Theoretical investigation of loss of plasticity (LoP) in deep learning under non-stationary environments. Authors identify two primary mechanisms: activation saturation and representational redundancy creating traps in parameter space. Paradox: properties promoting static generalization (low-rank representations) worsen LoP in continual learning.

Reinforcement learningPapersAlignment
SIG
75
HYP
15
arXiv cs.AI·

OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval

OPERA is a retrieval-augmented generation (RAG) architecture coupling planning and execution via reinforcement learning. A Goal Planning Module decomposes complex questions into sub-goals, executed by a Reason-Execute Module with specialized components for reasoning and retrieval. Training uses MAPGRPO, a GRPO variant. Superior results on complex multi-hop benchmarks.

RAGReinforcement learningReasoning
SIG
75
HYP
25
arXiv cs.AI·

HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction

HTSC-2025 is an open-source benchmark of high-temperature superconducting materials discovered 2023-2025 (X₂YH₆ systems, MXH₃ perovskites, M₃XH₈, BCN-doped cage structures, 2D honeycomb). Addresses the lack of standardized datasets for fair comparison of AI algorithms predicting critical transition temperatures.

BenchmarksPapersOpen source
SIG
75
HYP
25
arXiv cs.AI·

SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation

SAPO improves generative recommendation by aligning reinforcement learning optimization to individual reasoning steps. Instead of assigning a single advantage to the entire response, SAPO computes separate group-relative advantages for each reasoning step and SID token, stabilizing training and outperforming baselines across three real-world datasets.

Reinforcement learningReasoningCode generation
SIG
75
HYP
15
arXiv cs.CL·

Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models

Guided Topology Diffusion (GTD) uses graph diffusion models to dynamically generate optimal communication topologies for multi-agent LLM systems. The iterative framework, guided by a proxy model predicting multi-objective rewards (accuracy, utility, cost), adapts topologies to tasks without gradient-based optimization, outperforming static approaches.

Multi-agentAI AgentsBenchmarks
SIG
75
HYP
25