Page 72 of 148

AllHigh signalRecent

5899 articles

Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation

ViTC-UNet conditions a UNet on frozen pre-trained Vision Transformer representations via learnable tokens and two-way attention decoder. The approach improves biomedical semantic segmentation on MRI and CT without end-to-end fine-tuning, combining ViT global priors with UNet local inductive bias and high-resolution decoding.

Vision Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

Action-Gradient MCTS (AGMCTS) combines global tree search with local gradient-based action refinement for online planning in continuous spaces. Three theoretical contributions: action score gradient theorem, Multiple Importance Sampling Tree for sample reuse, tractable gradients via Area Formula. Outperforms state-of-the-art sample-based solvers on continuous MDP/POMDP benchmarks.

Reasoning Reinforcement learning Papers

SIG

HYP

arXiv cs.AI·May 19

A Machine with Short-Term, Episodic, and Semantic Memory Systems

AI agent model with three human-inspired memory systems (short-term, episodic, semantic), each modeled as a knowledge graph. Evaluated in custom RL environment « the Room ». Deep Q-learning agent learns to encode, store, and retrieve memories to answer questions. Outperforms agent without this memory structure.

Reinforcement learning Reasoning AI Agents

SIG

HYP

arXiv cs.AI·May 19

Attention Sinks and Outliers in Attention Residuals

OASIS, an inter-layer null signaling technique, reduces attention sinks and activation outliers in AttnResidual architectures. Across three datasets, OASIS decreases maximum infinity norm by 9.26%, average kurtosis by 2.60%, and improves post-quantization performance (W8A8: -75.85% perplexity, W4A4: +12.42% GSM8K).

Reasoning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

MARR introduces a low-bit post-training quantization method (≤4-bit) for LLMs and Vision Transformers using module-specific scaling coefficients to balance accumulated-error correction and residual-induced bias, with a PID-based adaptive update strategy. Achieves up to 20.2% gains on LLMs and 4.6% on ViTs over prior residual reconstruction methods.

Vision Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

Controlled study comparing human vs synthetic soft-labels on MNIST. Human labels improve calibration by regularizing predictions on difficult samples and aligning model uncertainty with human uncertainty, beyond simple mislabeling correction.

Evals Alignment AI safety

SIG

HYP

arXiv cs.AI·May 19

Graph Hierarchical Recurrence for Long-Range Generalization

Graph Hierarchical Recurrence (GHR) is a novel framework for GNNs and Graph Transformers that captures long-range dependencies through hierarchical abstraction via pooling. GHR outperforms existing models on long-range benchmarks using only 1% of SOTA parameters, and improves out-of-distribution generalization.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise

Study of memorization-generalization coexistence in over-parameterized neural networks. With 80% label noise on arithmetic tasks, models memorize noisy labels while maintaining an internal generalization structure. Frequency-based extraction achieves near-perfect accuracy. Task-agnostic partitioning into generalization/memorization components proposed.

Papers Evals Alignment

SIG

HYP

arXiv cs.AI·May 19

FedSDR: Federated Self-Distillation with Rectification

FedSDR addresses federated fine-tuning of LLMs under statistical heterogeneity. The method combines self-distillation (FedSD) with a dual-stream mechanism: a local LoRA-S branch to absorb heterogeneity via distilled data, and a parallel global LoRA-R branch anchored to raw data for factual correctness.

Fine-tuning Reinforcement learning Alignment

SIG

HYP

arXiv cs.AI·May 19

PromptDecipher: Supporting AI Tutor Authoring Through Editable Simulated Interactions

PromptDecipher is an authoring system for AI tutoring chatbots that restructures workflow around direct corrections rather than abstract system prompts. Teachers interact with a live chat preview, edit undesirable bot responses, and an automated pipeline proposes targeted prompt rewrites validated across pre-defined test scenarios.

Prompt engineering AI Agents Tools

SIG

HYP

arXiv cs.CL·May 19

RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation

RAGA is an LLM-based autonomous agent for knowledge graph construction and retrieval-augmented generation. It replaces stateless batch pipelines with a ReAct loop supporting full CRUD operations, hybrid KG-vector synchronization, and evidence-anchored verification linked to source text. Experiments on QASPER show measurable gains in answer and evidence quality.

AI Agents RAG Reasoning

SIG

HYP

arXiv cs.AI·May 19

Isotonic Survival Regression: Calibrated Survival Distributions from Deep Cox Models

Post hoc calibration method for Deep Cox models using isotonic regression. Improves calibration of predicted survival probabilities without affecting discriminative power. Theoretical guarantees including double-robustness and asymptotic calibration, validated on synthetic and real-world clinical data.

Papers Evals AI safety

SIG

HYP

arXiv cs.AI·May 19

Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions

HGPM (Hypergraph Pattern Machine) models higher-order interactions by tokenizing compositional subsets and using an inclusion-aware Transformer. On 10 hypergraph benchmarks, the method matches or exceeds state-of-the-art, notably in adverse-event prediction where it correctly identifies inhibitory drug combinations that existing methods miss.

Papers Benchmarks Reasoning

SIG

HYP

arXiv cs.AI·May 19

Exploring Trust Calibration in XAI - The Impact of Exposing Model Limitations to Lay Users

Preregistered study (N=418) on trust calibration in explainable AI: disclosing model limitations slightly improves alignment between user trust and actual performance in skin-lesion classification, but direct experience outweighs onboarding manipulations.

Evals AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

Confidence-Gated Robot Autonomy: When Does Uncertainty Actually Help?

Study on using predictive uncertainty for autonomous/deferral decisions in robotics. Across three temporal activity-recognition benchmarks, uncertainty provides reliable error ranking only when the base model is sufficiently competent. Softmax, MC Dropout, and ensembles produce similar gating behavior; threshold choice has larger impact than uncertainty method.

Robotics Evals

SIG

HYP

arXiv cs.CL·May 19

UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

New clinical VQA benchmark UCSF-PDGM-VQA: 2,387 QA pairs from 473 glioma MRI studies. Evaluation of 6 VLMs and 1 LLM shows current models fail on multi-sequence 3D MRI, suffering from modality collapse and over-reliance on language priors.

Vision Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic Retrieval

Visual Agentic Memory (VAM) is a training-free framework for long video understanding. It combines online selective indexing, hierarchical memory, and agentic retrieval. On OVO-Bench, VAM achieves 68.41 (vs 67.46 for Gemini 3 Flash alone) and 17.11% on MM-Lifelong (105.6h over 51 days).

Vision AI Agents Gemini

SIG

HYP

arXiv cs.AI·May 19

Improving Spatio-Temporal Residual Error Propagation by Mitigating Over-Squashing

Teger, a structured uncertainty module, improves spatio-temporal time-series forecasting by mitigating over-squashing through a Forman curvature-aware graph rewiring mechanism. Integrated into a low-rank-plus-diagonal covariance head, Teger is backbone-agnostic and demonstrates consistent CRPS improvements across LSTM, Transformer, and xLSTM architectures.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

WhiteTesseract combines high-resolution XR and conversational AI to enhance museum visits. Tested on a Monet exhibition with 26 participants, the system increases viewing duration from 35.3 to 98.3 seconds (p<0.001). 60% of 529 interactions extend beyond factual queries to include analytical, emotional, and comparative inquiries.

Vision AI Agents Papers

SIG

HYP

arXiv cs.AI·May 19

Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport

CSRNet optimized with parameter-free attention mechanisms for crowd counting in public transport. Evaluation of PFCA, SA, and SimAM modules on ShanghaiTech dataset. Novel PFCASA combination (PFCA+SA) outperforms parameterized approaches while reducing model size, applicable to edge-deployed systems.

Vision Benchmarks Infrastructure

SIG

HYP

arXiv cs.AI·May 19

An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments

Empirical study of privacy-leakage chains via prompt injection in black-box chatbot environments. Researchers analyze how attackers can hijack LLM agent tasks by injecting malicious content into external sources. They introduce the 'exemplification' technique and demonstrate a functional data-exfiltration chain combining prompt injection, jailbreaking, and web-tool invocation.

AI Agents Prompt engineering AI safety

SIG

HYP

arXiv cs.AI·May 19

Breaking the accuracy-resource dilemma: a lightweight adaptive video inference enhancement

Video inference enhancement method using a fuzzy controller (FC-r) to dynamically adapt model sizes based on available device resources. Leverages spatiotemporal correlation across adjacent frames. Balances inference performance and resource efficiency without scaling up model complexity.

Video generation Reasoning Infrastructure

SIG

HYP

arXiv cs.AI·May 19

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

Vision Inference Former (VIF) is a lightweight architectural module improving visual consistency in multimodal models. It continuously injects visual semantics during generation to counter weakening vision-language alignment over long sequences. Tested on 14 benchmarks (reasoning, OCR, tables), VIF improves performance with minimal overhead.

Vision Multi-agent Alignment

SIG

HYP

arXiv cs.AI·May 19

Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign

Peak-Detector leverages instruction-tuned LLMs for peak detection across physiological signals (ECG, PPG, BCG, BSG) with explainability. A "peak-representation" technique compresses time-series while preserving critical events. The model is optimized via supervised fine-tuning then multi-objective reinforcement learning, evaluated on 7 datasets (6 public benchmarks + 1 real-world cohort).

Reasoning Fine-tuning Reinforcement learning

SIG

HYP

arXiv cs.AI·May 19

PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

PAREDA is a multi-accent speech dataset (Australian, Indian, Chinese English) featuring spontaneous discussions on NLP papers. SOTA ASR models degrade in zero-shot settings, but fine-tuning on PAREDA significantly reduces WER, validating the corpus's value for building robust ASR systems.

Benchmarks Voice Papers

SIG

HYP

arXiv cs.AI·May 19

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

RGB-only framework for active 3D scene graph (3DSG) generation using fixed external cameras as common prior maps. System fuses onboard and external camera observations in a single hardware-agnostic pipeline, guiding robots toward high semantic uncertainty regions. Single external camera increases initial object recall by +79%.

Vision Robotics AI Agents

SIG

HYP

arXiv cs.AI·May 19

Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization

BiKD introduces a bilevel framework to dynamically balance hard and soft losses in knowledge distillation on imbalanced data. A weight generation network produces adaptive per-sample weights guided by a small balanced validation set. Experiments on long-tailed CIFAR-10/100 show improvements over recent balanced distillation methods.

Fine-tuning Benchmarks Papers

SIG

HYP

arXiv cs.CL·May 19

HyperPersona: A Multi-Level Hypergraph Framework for Text-Based Automatic Personality Prediction

HyperPersona introduces a multi-level hypergraph framework for text-based automatic personality prediction. The model represents documents, sentences, and words as hyperedges and nodes, capturing global, local, and lexical dependencies. Evaluated on Big Five personality dimensions, it outperforms existing baselines by leveraging textual hierarchy.

Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

PESD-TSF: A Period-Aware and Explicit Structured Decomposition Framework for Long-Term Time Series Forecasting

PESD-TSF is a physics-inspired structured decomposition framework for long-term time series forecasting. It introduces a Multiplicative Periodic Gating mechanism, a multi-scale encoder with detrended attention, and Cross-Scale Collaborative Attention (CSCA) to preserve periodic structures and inter-variable dependencies across deep layers.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries

PIPER is a content-driven retrieval method for tabular datasets using table profiles and LLM-generated pseudoqueries for dense retrieval. It outperforms metadata-based baselines and existing TableQA methods in poor-metadata settings.

RAG Embeddings Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Concise and Logically Consistent Conformal Sets for Neuro-Symbolic Concept-Based Models

COCOCO, a post-hoc framework, integrates Conformal Prediction into neuro-symbolic concept-based models (NeSy-CBMs) to improve reliability. It conformalizes concepts and labels jointly via a single deduction-abduction revision step, ensuring consistency, coverage, and conciseness with distribution-free guarantees. Tested on 8 datasets.

Reasoning AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

SpatioRoute is a dynamic prompt routing approach for spatial question-answering over egocentric video. Without fine-tuning, it routes each question to a specialized prompt template (rule-based or LLM-driven mode) and achieves +5% accuracy gains on SQA3D versus fixed baselines, establishing SOTA for zero-shot video-only spatial VQA without 3D point-cloud inputs.

Prompt engineering Vision Reasoning

SIG

HYP

arXiv cs.AI·May 19

A Simplex Witness Certificate for Constant Collapse in Variational Autoencoders

Theoretical paper on constant collapse in variational autoencoders, where the encoder mean becomes input-independent. Authors introduce a simplex witness certificate to detect and prevent this failure mode during training, with exact baseline and closed-form inverse.

Papers Evals

SIG

HYP

arXiv cs.AI·May 19

Context Memorization for Efficient Long Context Generation

Training-free approach for long-context LLM inference: attention-state memory externalizes prefix into lightweight lookup-based memory of precomputed attention states. On LLaMA-3.1-8B, reduces attention latency by 1.36x at 8K tokens and outperforms full-attention RAG with 20% memory footprint.

Llama Reasoning RAG

SIG

HYP

arXiv cs.AI·May 19

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

TAPE, a training-free pruning method for diffusion-based video generation, reduces computational overhead by intelligently removing tokens while maintaining temporal coherence across frames. It applies temporal smoothing between frames, performs token reselection per layer, and uses timestep-level budget scheduling to prune aggressively at early steps and relax during refinement.

Video generation

SIG

HYP

arXiv cs.AI·May 19

Efficient Bilevel Optimization for Meta Label Correction in Noisy Label Learning

EBOMLC method for noisy label correction via efficient bilevel optimization. Uses a meta model trained on clean data to correct large noisy dataset. Reduces hypergradient computational cost and improves stability on CIFAR-10/100 under high noise rates.

Papers Benchmarks Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Privacy Preserving Reinforcement Learning with One-Sided Feedback

POOL, a novel privacy-preserving RL algorithm, addresses reinforcement learning in multi-dimensional continuous state-action spaces with one-sided feedback. Theoretical analysis derives sample complexity matching known lower bounds for non-private RL, demonstrating strong privacy guarantees are achievable without sacrificing learning efficiency.

Reinforcement learning AI safety Papers

SIG

HYP

arXiv cs.AI·May 19

Machine Unlearning for Masked Diffusion Language Models

First machine unlearning framework for masked diffusion language models (MDLMs like LLaDA and Dream). MDU minimizes KL divergence from prompt-conditional predictions to prompt-masked unconditional anchor, with temperature scaling to control privacy-utility trade-off. Code released.

Papers AI safety Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

CodeBind introduces a multimodal alignment framework using a shared-specific compositional codebook design. Tested across 9 modalities (text, image, video, audio, depth, thermal, tactile, 3D point cloud, EEG), it achieves state-of-the-art performance in multimodal classification and retrieval without requiring fully paired data.

Embeddings Vision RAG

SIG

HYP

arXiv cs.CL·May 19

RAG-based EEG-to-Text Translation Using Deep Learning and LLMs

RAG-based pipeline for sentence-level EEG-to-text decoding. Combines EEG encoder aligned with semantic embeddings, vector retrieval, and LLM refinement. On ZuCo dataset, achieves 30.45% improvement over random baseline (cosine similarity 0.181 vs 0.139).

RAG Embeddings Vector search

SIG

HYP