Page 79 of 149

AllHigh signalRecent

5940 articles

Hierarchical Two-Stage Framework for Environment-Aware Long-Horizon Vessel Trajectory Prediction

Hierarchical two-stage framework for long-horizon vessel trajectory prediction under real ocean conditions. Combines long-term predictor with short-term Spatio-Temporal Graph Transformer on discretized maritime cells. Environmental module integrates currents, wind, wave height via cross-modal attention. Results: 25% improvement in ADE, 17% in FDE on Australian CTS data.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

New approach for learning generalized policies in classical planning using Relational Graph Neural Networks (R-GNNs). Authors introduce efficient lookahead search encoding and relational abstraction to improve scalability on IPC 2023 benchmark. Results outperform classical planner LAMA.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Attention-Guided Fusion of 1D and 2D CNNs for Robust ECG-Based Biometric Recognition

Hybrid framework combining 1D and 2D CNNs with attention-guided fusion for ECG-based biometric recognition. Evaluation on ECG-ID, MIT-BIH, PTB: 99.56%, 100%, 99.89% accuracy. Multi-session tests (Heartprint, 10 years): 98.54%-99.09% same-session, 53-56% cross-session.

Vision Benchmarks Evals

SIG

HYP

arXiv cs.AI·May 19

Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

GVG (Generative Visual Grounding) uses an EEG-to-image generative model to translate brain activity into visual images, bypassing text-only alignment. Tested on GVG-X-Omni (170M tuned params) and GVG-Janus (trimodal), the framework improves EEG understanding and visual generation by leveraging MLLMs' visual priors.

Vision Multi-agent Embeddings

SIG

HYP

arXiv cs.AI·May 19

LAST-RAG: Literature-Anchored Stochastic Trajectory Retrieval-Augmented Generation for Knowledge-Conditioned Degradation Model Selection

LAST-RAG proposes a method for selecting stochastic degradation models to estimate remaining useful life (RUL). It combines observed trajectories and domain context via retrieval from a local evidence bank, with RCRUS mechanism to prevent premature model elimination. Experiments show outperformance versus statistical and prognostic baselines.

RAG Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights

Reinforcement learning framework for predicting customer trajectories in retail spaces. RL-based approach outperforms TSP/PNN heuristics (average 28% deviation from shortest paths) by modeling bounded rationality. Validated on real convenience store data: RL predictions better align with observed behavior, more accurate impulse purchase rates and shelf traffic estimates, enabling practical layout optimization.

Reinforcement learning AI Agents Business

SIG

HYP

arXiv cs.AI·May 19

Building Reliable Arithmetic Multipliers Under NBTI Aging and Process Variations

Paper on mitigating NBTI aging in arithmetic multipliers used in AI. The technique exploits sign-invariance of multiplication to redistribute transistor stress via 2's complement transformations. Integrated into systolic arrays, it improves lifetime with negligible area and delay overhead.

Papers Benchmarks AI safety

SIG

HYP

arXiv cs.AI·May 19

Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification

Benchmark of LLMs on legal precedent treatment classification. Expert-annotated dataset of 239 real-world legal citations. Gemini 2.5 Flash achieves 79.1% on high-level classification, GPT-5-mini 67.7% on fine-grained schema. Novel Average Severity Error metric to measure practical impact of misclassifications.

Benchmarks Gemini GPT

SIG

HYP

arXiv cs.AI·May 19

Fine-tuning Pocket-Aware Diffusion Models via Denoising Policy Optimization

DEPPA optimizes pocket-aware diffusion models for drug design via reinforcement learning. The method fine-tunes a pre-trained model by formulating the denoising process as a Markov Decision Process, optimizing binding affinity, drug-likeness, synthesizability and diversity. On CrossDocked2020 benchmark, DEPPA achieves Vina Score -8.5 kcal/mol.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning

LMAC leverages LLM reasoning to design communication protocols in MARL, enabling agents to reconstruct the underlying state uniformly and accurately. The approach iteratively refines protocols using an explicit state-awareness criterion. Experiments on MARL benchmarks demonstrate substantial performance gains over prior baselines.

Multi-agent Reinforcement learning Reasoning

SIG

HYP

arXiv cs.CL·May 19

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

HINT-SD proposes targeted self-distillation for training long-horizon LLM agents. The method uses full-trajectory hindsight to identify failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. On BFCL v3 and AppWorld, it improves over dense per-turn feedback baselines by up to 18.80% while achieving 2.26× lower time per training step.

AI Agents Reinforcement learning Reasoning

SIG

HYP

arXiv cs.AI·May 19

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

SkillsVote is a lifecycle-governance framework for LLM agent skills from collection to evolution. It profiles a million-scale open-source corpus for quality and verifiability, then decomposes trajectories into skill-linked subtasks with outcome attribution. Results: +7.9pp on Terminal-Bench 2.0 (GPT-5.2) and +2.6pp on SWE-Bench Pro.

AI Agents Benchmarks Code generation

SIG

HYP

arXiv cs.AI·May 19

Diffusion Attention Expert Model for Predicting and Semi-automatic Localizing STAS in Lung Cancer Histopathological Images

DAEM (Diffusion Attention Expert Model) detects STAS (spread through air spaces) in lung cancer histopathological images. Model achieves AUC 0.8946 on frozen sections and 0.9112 on paraffin sections. Validated across 8 external institutions. Semi-automatic localization and TME biomarkers identified.

Vision Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Game-theoretic decoding method for small vision-language models (2-8B) in medical imaging. Wasserstein semantic stopping criterion replaces lexical matching, improving Qwen3-VL-2B by +3.5pp on VQA-RAD and reducing convergence iterations by 20% while maintaining reliability.

Vision Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

Systematic evaluation of synthetic clinical notes generated by LLMs at million-note scale. Study shows synthetic notes preserve core clinical information for coarse-grained tasks but lose fine-grained details for ICD coding. Chunk-based rephrasing mitigates detail loss but reduces factual precision under incomplete context.

Benchmarks Evals AI safety

SIG

HYP

arXiv cs.CL·May 19

CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

CodeBind introduces a multimodal alignment framework using shared-specific compositional codebooks. The method decomposes representations into semantic shared components and modality-unique components, validated across 9 modalities (text, image, video, audio, depth, thermal, tactile, 3D point cloud, EEG) achieving state-of-the-art performance in classification and retrieval tasks.

Embeddings Vision Robotics

SIG

HYP

arXiv cs.AI·May 19

The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration

Study on contextual sycophancy in LLMs: 60 participants collaborated with AI on analytical tasks. Results show models mirror user errors rather than correct them. An AI literacy intervention reduced incorrect mirroring but did not eliminate error propagation, suggesting system-level approaches are needed.

Alignment AI safety Evals

SIG

HYP

arXiv cs.AI·May 19

Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

CGPO (Curriculum Group Policy Optimization) improves text-to-image model training via adaptive curriculum based on reward variance. Method prioritizes partially-mastered prompts (high variance) and balances categories through proportional fairness optimization. Gains validated on GenEval, T2I-CompBench++, DPG Bench.

Image generation Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Divergence-Suppressing Couplings for Rectified Flow

Authors identify that trajectory entanglement in Rectified Flow stems from nonzero divergence regions in the learned velocity field. They propose an offline correction that attenuates the divergent component during coupling generation, with no deployment overhead. Improvements validated on 2D benchmarks and image generation.

Image generation Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Study of a shared-weight recurrent Transformer architecture (AIR) that develops two distinct roles without modular partitioning. On Sudoku-Extreme and Maze, state zH acts as committed proposal while zL retains local uncertainty. Freezing experiments and ablations show that input injection asymmetry induces this functional specialization.

Reasoning Papers

SIG

HYP

arXiv cs.AI·May 19

ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

ISEP proposes an offline reinforcement learning method that implicitly expands action support by interpolating between in-distribution data and policy samples. A stochastic mechanism alternates between conservative cloning and optimistic expansion signals, implemented via Conditional Flow Matching with classifier-free guidance.

Reinforcement learning Papers

SIG

HYP

arXiv cs.CL·May 19

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

ESI-Bench is a benchmark for embodied spatial intelligence spanning 10 task categories on OmniGibson. Experiments show active exploration outperforms passive approaches, but models fail primarily from "action blindness": poor action choices lead to poor observations and cascading errors. Models lack metacognition compared to humans.

Benchmarks Vision Reasoning

SIG

HYP

arXiv cs.CL·May 19

AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning

AdaSwitch proposes a cloud-local collaborative paradigm where a local agent (small LLM) handles simple tasks and requests assistance from a cloud agent (large LLM) for complex reasoning. The adaptive mechanism detects local errors and dynamically switches. Evaluation on 7 benchmarks (mathematical reasoning, complex QA) shows performance improvement with reduced computational overhead.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·May 19

CommitDistill: A Lightweight Knowledge-Centric Memory Layer for Software Repositories

CommitDistill is an open-source Python prototype extracting typed knowledge units (Facts, Skills, Patterns) from local git history via deterministic regex and exposing them through a TF-IDF retriever. Tested on 5 repositories (25k commits), it achieves 0.750 hit-rate at 256-character budget versus 0.333 for BM25. No statistically detectable improvement on time-travel bug-fixes in LLM-as-judge evaluation.

Code generation RAG AI Agents

SIG

HYP

arXiv cs.CL·May 19

EmoMind: Decoding Affective Captions from Human Brain fMRI

EmoMind decodes affective captions directly from fMRI signals in two stages: retrieving semantically grounded neutral descriptions, then rewriting using a continuous 34-dimensional emotion vector. Uses classifier-free guidance to balance semantic fidelity and affective expressivity. Outperforms GPT-4 on two independent fMRI emotion datasets.

Vision Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·May 19

RAG-based EEG-to-Text Translation Using Deep Learning and LLMs

RAG-based pipeline for sentence-level EEG-to-text decoding. Combines EEG encoder aligned with semantic embeddings, vector retrieval, and LLM refinement. On ZuCo dataset, achieves 30.45% improvement over random baseline (cosine similarity 0.181 vs 0.139).

RAG Embeddings Vector search

SIG

HYP

arXiv cs.CL·May 19

Responsible Federated LLMs via Safety Filtering and Constitutional AI

Research integrating safety filtering and Constitutional AI into federated LLM training (FedLLM). Authors demonstrate these techniques improve safety by over 20% on AdvBench, mitigating risks of unsafe model aggregation and redistribution across clients.

AI safety Alignment Reinforcement learning

SIG

HYP

arXiv cs.AI·May 19

CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

CodeBind introduces a multimodal alignment framework using a shared-specific compositional codebook design. Tested across 9 modalities (text, image, video, audio, depth, thermal, tactile, 3D point cloud, EEG), it achieves state-of-the-art performance in multimodal classification and retrieval without requiring fully paired data.

Embeddings Vision RAG

SIG

HYP

arXiv cs.AI·May 19

Machine Unlearning for Masked Diffusion Language Models

First machine unlearning framework for masked diffusion language models (MDLMs like LLaDA and Dream). MDU minimizes KL divergence from prompt-conditional predictions to prompt-masked unconditional anchor, with temperature scaling to control privacy-utility trade-off. Code released.

Papers AI safety Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Privacy Preserving Reinforcement Learning with One-Sided Feedback

POOL, a novel privacy-preserving RL algorithm, addresses reinforcement learning in multi-dimensional continuous state-action spaces with one-sided feedback. Theoretical analysis derives sample complexity matching known lower bounds for non-private RL, demonstrating strong privacy guarantees are achievable without sacrificing learning efficiency.

Reinforcement learning AI safety Papers

SIG

HYP

arXiv cs.AI·May 19

PESD-TSF: A Period-Aware and Explicit Structured Decomposition Framework for Long-Term Time Series Forecasting

PESD-TSF is a physics-inspired structured decomposition framework for long-term time series forecasting. It introduces a Multiplicative Periodic Gating mechanism, a multi-scale encoder with detrended attention, and Cross-Scale Collaborative Attention (CSCA) to preserve periodic structures and inter-variable dependencies across deep layers.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

Novel approach to extend Knowledge Graphs for French cultural heritage. Authors introduce WJoconde, a multimodal KG integrating text and images, with three variants and a benchmark for Knowledge Graph Completion. They propose a framework combining LLMs and Vision-Language Models for automated data extraction and validation, improving KG reliability.

Vision RAG Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Efficient Bilevel Optimization for Meta Label Correction in Noisy Label Learning

EBOMLC method for noisy label correction via efficient bilevel optimization. Uses a meta model trained on clean data to correct large noisy dataset. Reduces hypergradient computational cost and improves stability on CIFAR-10/100 under high noise rates.

Papers Benchmarks Fine-tuning

SIG

HYP

arXiv cs.CL·May 19

Factual Inconsistencies in Multilingual Wikipedia Tables

Study of factual inconsistencies in multilingual Wikipedia tables. Researchers developed methodology to collect and analyze tables across 300+ language versions of Wikipedia, identifying inconsistency categories. Implications for fact verification and reliability of AI systems trained on Wikipedia.

Benchmarks Evals RAG

SIG

HYP

arXiv cs.AI·May 19

Context Memorization for Efficient Long Context Generation

Training-free approach for long-context LLM inference: attention-state memory externalizes prefix into lightweight lookup-based memory of precomputed attention states. On LLaMA-3.1-8B, reduces attention latency by 1.36x at 8K tokens and outperforms full-attention RAG with 20% memory footprint.

Llama Reasoning RAG

SIG

HYP

arXiv cs.AI·May 19

A Simplex Witness Certificate for Constant Collapse in Variational Autoencoders

Theoretical paper on constant collapse in variational autoencoders, where the encoder mean becomes input-independent. Authors introduce a simplex witness certificate to detect and prevent this failure mode during training, with exact baseline and closed-form inverse.

Papers Evals

SIG

HYP

arXiv cs.AI·May 19

NeuSymMS: A Hybrid Neuro-Symbolic Memory System for Persistent, Self-Curating LLM Agents

NeuSymMS is a hybrid neuro-symbolic memory system for LLM agents. It couples neural fact extraction from dialogue with a CLIPS-based expert system that classifies, deduplicates, and reconciles facts. Knowledge is stored as subject-relation-value triples in a relational database, with short/long-term memory and access-based promotion.

AI Agents RAG Reasoning

SIG

HYP

arXiv cs.AI·May 19

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

SpatioRoute is a dynamic prompt routing approach for spatial question-answering over egocentric video. Without fine-tuning, it routes each question to a specialized prompt template (rule-based or LLM-driven mode) and achieves +5% accuracy gains on SQA3D versus fixed baselines, establishing SOTA for zero-shot video-only spatial VQA without 3D point-cloud inputs.

Prompt engineering Vision Reasoning

SIG

HYP

arXiv cs.CL·May 19

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

Novel 1-bit LLM quantization method leveraging pre-trained models. Uses consistent progressive training (forward/backward) with binary-aware initialization and dual-scaling compensation to convert weights to binarized representation. Reduces training costs and accuracy degradation versus existing approaches.

Fine-tuning Benchmarks Infrastructure

SIG

HYP

arXiv cs.CL·May 19

Early Stopping Chain-of-thoughts in Large Language Models

ES-CoT detects answer convergence during chain-of-thought generation to stop inference early. The method reduces inference tokens by 16.08% on average across six reasoning benchmarks while maintaining comparable accuracy to standard CoT.

Reasoning Prompt engineering Benchmarks

SIG

HYP