Page 78 of 149

AllHigh signalRecent

5939 articles

Stochastic Penalty-Barrier Methods for Constrained Machine Learning

New SPBM method for constrained optimization in deep learning. Combines penalty methods, barrier methods, and exponential dual averaging to handle non-convexity and non-smoothness. Demonstrates effectiveness on fairness, physics-informed networks, and symbolic knowledge integration with linear overhead up to 10k constraints.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Visual Sculpting: Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting

Robotic clay sculpting planning method using visually-aligned representation. System models deformable material dynamics capturing textures and lighting, enabling long-horizon planning (>100 actions) without per-goal retraining. Tested on three materials with various end-effectors.

Robotics Vision Reasoning

SIG

HYP

arXiv cs.CL·May 19

Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework

Study on 'alignment drift': gradual process where LLM outputs become less constrained by user's current message and more shaped by interaction history, while remaining helpful. Mechanism-oriented framework distinguishes signal A/B, feedback loops, and interactive regimes to control this cumulative drift.

Alignment AI Agents AI safety

SIG

HYP

arXiv cs.AI·May 19

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

CATA introduces a continual machine unlearning method for vision-language models (VLMs). It represents each unlearning request as a task vector and aggregates historical vectors by suppressing conflicting components, ensuring forgetting effectiveness, model fidelity, and persistence against knowledge re-emergence.

Vision AI safety Papers

SIG

HYP

arXiv cs.AI·May 19

UVTran: Accurate Hole-Filling Parameterization with Transformers

UVTran, a transformer-based framework, solves N-sided hole filling in CAD by predicting an auxiliary projection surface via cross-attention biased toward nearby control points, voxelizing coordinates, and progressive-resolution training. On benchmark, it improves tolerance-satisfaction rate by 12% over industrial and academic baselines while producing fairer trimmed surfaces.

Papers Reasoning

SIG

HYP

arXiv cs.AI·May 19

Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

FedNL reformulates federated learning as a three-level nested optimization system. Embeds Titans-based linear attention for zero-shot test-time adaptation without additional training. Tested on Non-IID MMLU and long-context benchmarks with constant inference memory.

Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·May 19

To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Embeddings, Except In Heavy Truncation Scenarios

An arXiv study compares Matryoshka Representation Learning (MRL) with simple embedding truncation. Results show non-MRL embeddings remain robust up to 80% dimensionality reduction. MRL provides advantage only for heavy truncation (>80%), questioning its systematic training cost.

Embeddings Papers Benchmarks

SIG

HYP

arXiv cs.CL·May 19

Recall Isn't Enough: Bounding Commitments in Personalized Language Systems

Paper introduces Contract-Bounded Evidence Activation (CBEA) with Lexicographic Commitment Validation (LCV) for personalized language systems. CBEA+LCV achieves zero failures at 0.49-0.60 availability versus 0.003-0.092 for baselines, with 74-75% median input payload reduction.

Reasoning RAG Evals

SIG

HYP

arXiv cs.AI·May 19

When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

Paper introducing trace-based evaluation to detect when agents hit business KPIs while violating behavioral constraints. In hotel pricing with hidden competitor state, authors show PPO variants fail trace alignment while behavior cloning and Trace-Prior RL better preserve price/bid distributions and rate discipline.

Reinforcement learning Evals AI Agents

SIG

HYP

arXiv cs.AI·May 19

StrLoRA: Towards Streaming Continual Visual Instruction Tuning for MLLMs

StrLoRA introduces a streaming continual visual instruction tuning framework for MLLMs. Unlike existing methods restricted to predefined tasks, StrCVIT handles data streams with dynamic, interleaved tasks. StrLoRA employs two-stage expert routing with task-aware selection and token-wise weighting, stabilized via routing-stability regularization.

Multi-agent Fine-tuning Vision

SIG

HYP

arXiv cs.CL·May 19

FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation

FIM-LoRA optimizes rank allocation in LoRA by using 8 calibration passes to estimate gradient variance per layer. This parameter-free approach matches standard LoRA performance (88.6 vs 88.7 on GLUE with DeBERTa-v3-base) while reducing memory costs by 256x compared to full Fisher estimation.

Fine-tuning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

Study on automated classification of invalid bug reports and no-code fix generation. Researchers propose a standardized taxonomy and benchmark, testing vanilla LLMs, RAG, and agentic web search. RAG achieves 0.66 weighted F1 for subclassification; agentic web search reaches 68.9% Judge LLM success rate for fix generation.

RAG AI Agents Benchmarks

SIG

HYP

arXiv cs.AI·May 19

UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts

UniAlign is a model-agnostic framework improving robustness of network traffic classification under distribution shifts. It combines domain alignment fine-tuning and stable model ensembling, achieving 2.51% accuracy and 2.71% F1 improvements on three public datasets, requiring only 12.4–53.9% of baseline training time.

Benchmarks Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Estimating Item Difficulty with Large Language Models as Experts

Study evaluating three off-the-shelf LLMs to estimate difficulty of educational items without response data. Across 6 primary-school mathematics domains, Spearman correlations show moderate-to-strong alignment with empirical difficulties. Pairwise comparisons outperform absolute judgements; adding token probabilities and few-shot examples improves results.

Prompt engineering Evals Benchmarks

SIG

HYP

arXiv cs.CL·May 19

DriveSafe: A Framework for Risk Detection and Safety Suggestions in Driving Scenarios

DriveSafe is a framework for risk assessment in autonomous driving scenarios. It generates spatially grounded captions enriched with motion and depth cues, then fine-tunes a lightweight adapter to identify hazardous objects and suggest safety actions. Achieves SOTA on DRAMA benchmark.

Vision Reasoning AI safety

SIG

HYP

arXiv cs.AI·May 19

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

AMR-SD introduces asymmetric meta-reflective self-distillation to improve token-level credit assignment in LLM reinforcement learning. The method compresses diagnostic signals into self-generated Socratic hints and uses Causal Information Gain with asymmetric ReLU-gated threshold for sparse token-level advantage modulation, preventing late-stage training collapse.

Reinforcement learning Reasoning Alignment

SIG

HYP

arXiv cs.AI·May 19

Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

GA-S2S combines T5-small with a Relational Graph Attention Network (RGAT) for knowledge graph link prediction. The model jointly encodes textual features and full k-hop subgraph topology around the query entity. On CoDEx, GA-S2S outperforms Seq2Seq baselines with 19% relative accuracy gain.

Benchmarks RAG Papers

SIG

HYP

arXiv cs.AI·May 19

Bayesian-Monte Carlo Schedule Updating for Construction Digital Twins: A Probabilistic Framework for Dynamic Project Forecasting

Bayesian-Monte Carlo probabilistic framework for dynamic construction project schedule updating. Models activity durations with lognormal distributions, updates them via Bayesian inference, and propagates uncertainty through Monte Carlo simulation. Demonstrates improved forecasting accuracy over deterministic CPM methods on PSPLIB benchmarks.

Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Virtual Nodes Guided Dynamic Graph Neural Network for Brain Tumor Segmentation with Missing Modalities

Brain tumor segmentation method using multimodal MRI with virtual nodes and dynamic graph neural networks. One-stage framework handling missing modalities through adaptive adjacency matrices and heterogeneous weight matrices. SOTA results on BRATS-2018/2020 with incomplete modalities.

Vision Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

Fre-Res introduces adaptive video-token compression for video MLLMs. The framework separates spatial details (high-fidelity anchors) from temporal evolution (residual-frequency tokens via 1D-DCT). A Spatial-Guided Absorber aligns frequency dynamics with visual embeddings. Results: near full-token performance with substantial reduction in token length across short and long-video benchmarks.

Vision Video generation Evals

SIG

HYP

arXiv cs.AI·May 19

Probing for Representation Manifolds in Superposition

A supervised method called Manifold Probe discovers representation manifolds in superposition within neural networks. Tested on Llama 2-7b, it identifies linear manifolds for time and space, and demonstrates causal control by steering model completions about release years of movies and songs.

Llama Reasoning

SIG

HYP

arXiv cs.CL·May 19

Responsible Agentic AI Requires Explicit Provenance

An arXiv paper argues that responsible agentic AI requires explicit, traceable provenance across the full lifecycle. Authors formalize this through a causal attribution function and responsibility tensor, demonstrating provenance is estimable and interventionable online before irreversible harm accumulates.

AI Agents AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

arXiv paper on spatial limitations of MLLMs in multi-agent environments. Models suffer from a "Cartesian Illusion": lack grounded 3D topological understanding. Authors propose an Epistemic Sensory Bottleneck module with Anchor-Based Embodied Spatial Decomposition CoT to improve second-order spatial inference (Theory of Mind). Zero-shot baseline: 42% accuracy.

Vision Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·May 19

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

ReTAMamba is a Mamba-based model for predicting irregular clinical time series. It estimates observation reliability from missingness and elapsed time, integrates multi-resolution information via Chronological Weaving, and uses a budgeted token router. On MIMIC-IV, eICU, and PhysioNet 2012, it improves AUPRC by 7.51%, 7.80%, and 10.15% respectively.

Benchmarks Papers Reasoning

SIG

HYP

arXiv cs.AI·May 19

Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation

PPR-GDE, an RL method for open-ended generation, uses pairwise preference rewards and group-based diversity to prevent diversity collapse. Without scalar rewards, it preserves subjective evaluations and encourages semantic dispersion within response groups.

Reinforcement learning Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

Multi-task learning on partially labeled datasets via invariant/equivariant semi-supervised learning

Investigation of invariant and equivariant semi-supervised learning (FixMatch, Dense FixMatch) for multi-task training on partially labeled datasets. Evaluation on Cityscapes and BDD100K for object detection and semantic segmentation. Equivariant approaches outperform supervised baselines, especially with limited labeled samples per task.

Vision Papers

SIG

HYP

arXiv cs.AI·May 19

Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

arXiv study demonstrates that color features alone (RGB/HSV histograms, statistical moments) achieve 89% accuracy in binary cancer/benign classification in histopathology, excluding morphological information. Authors propose these simple features as lightweight pre-screening tool before complex deep learning models.

Vision Benchmarks Evals

SIG

HYP

arXiv cs.AI·May 19

POST: Prior-Observation Adversarial Learning of Spatio-Temporal Associations for Multivariate Time Series Anomaly Detection

POST introduces an adversarial learning framework for multivariate time series anomaly detection. The model combines graph neural networks with minimax optimization over adjacency matrices to address spatial over-generalization. Evaluation on public and synthetic benchmarks with channel-wise anomaly localization.

Benchmarks Papers Reasoning

SIG

HYP

arXiv cs.AI·May 19

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

ConsumerSimBench, a benchmark built from 1,553 Chinese social-media topics and 23,122 reaction criteria, evaluates whether LLMs can reconstruct real consumer reaction patterns. Gemini-3.1-Pro covers only 47.8% of criteria, revealing a major gap between technical performance and consumer intuition. A multi-agent pipeline improves MiMo-V2.5-Pro from 32.9% to 37.6%.

Benchmarks Evals Multi-agent

SIG

HYP

arXiv cs.AI·May 19

Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

A label-wise attention model trained on combined ICD-9 and ICD-10 data improves rare medical code prediction by 27% micro F1 (18K rare codes) and macro metrics on frequent codes, despite version mismatch. Version-independent approach to automate clinical coding.

Benchmarks Fine-tuning Evals

SIG

HYP

arXiv cs.CL·May 19

QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

QQJ is an evaluation framework for generative AI that combines human judgment and LLMs. It uses expert-designed multi-dimensional rubrics and calibrates LLM evaluators on a small high-quality annotation set. Experiments on text and image generation show stronger alignment with human judgment than traditional automatic metrics and unconstrained LLM evaluators.

Evals Llama Vision

SIG

HYP

arXiv cs.CL·May 19

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

SafeLens introduces a two-tier video moderation architecture (fast-and-slow) to reduce inference costs. The framework filters SafeWatch dataset to 2.4% via influence-guided filtering and augments with Chain-of-Thought traces. It outperforms SafeWatch-8B, OmniGuard-7B, GPT-5.4, and Gemini-3.1-pro on real and AI-generated video benchmarks.

Vision AI safety Reasoning

SIG

HYP

arXiv cs.AI·May 19

EAGT: Echocardiography Augmentation for Generalisability and Transferability

Comparative study of 29 data augmentation techniques for 2D echocardiography segmentation on U-Net. Anatomically plausible geometric transformations (affine, shift-scale-rotate, perspective, horizontal flip) improve cross-dataset generalization, while aggressive intensity augmentations degrade it. Pairwise combinations outperform individual augmentations.

Vision Benchmarks Fine-tuning

SIG

HYP

arXiv cs.CL·May 19

Medical Context Distorts Decisions in Clinical Vision Language Models

arXiv study identifies three critical failure modes of vision-language models (VLMs) in clinical settings: over-reliance on text vs images, dependence on irrelevant clinical history, prompt sensitivity across semantically equivalent inputs. Testing on MIMIC-CXR shows VLM decisions dominated by text modality even when visual evidence is available.

Vision AI safety Evals

SIG

HYP

arXiv cs.AI·May 19

Learning to Solve Compositional Geometry Routing Problems

Study of Compositional Geometry Routing Problem (CGRP), a generalization of routing problems covering points, lines, areas, and hybrid geometries. Proposes DiCon, a solver with differential attention and contrastive learning to handle asymmetry and enlarged action spaces. Results show strong performance, versatility, and superior generalization across diverse instances.

Papers Reasoning

SIG

HYP

arXiv cs.AI·May 19

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment

PEIRA is a non-contrastive self-supervised learning method analyzing JEPA dynamics through a regularized linear regressor. It minimizes an explicit objective based on the trace of the optimal regressor, ensuring stable non-collapsed equilibria aligned with canonical correlation subspaces. Competitive results on ImageNet-1K and CIFAR-10.

Papers Benchmarks Embeddings

SIG

HYP

arXiv cs.AI·May 19

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint introduces a framework for LLM fingerprinting via parameter generation. Instead of fine-tuning each model separately, a specialized generator maps textual descriptions to low-rank parameter increments in a single forward pass, eliminating the computational overhead of existing methods.

Prompt engineering Fine-tuning AI safety

SIG

HYP

arXiv cs.AI·May 19

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

DocOS is a benchmark evaluating GUI agents capable of proactively searching online documentation to solve long-tailed tasks. Experiments reveal two bottlenecks: difficulty reliably locating relevant information and faithfully grounding retrieved instructions into precise GUI actions.

AI Agents Benchmarks Reasoning

SIG

HYP

arXiv cs.AI·May 19

New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

New zeroth-order hard-thresholding algorithm with variance reduction for ℓ0-constrained optimization. Addresses SZOHT's limitation on random directions by mitigating conflict between ZO gradient deviation and hard-thresholding expansivity. Improved convergence rates validated on ridge regression and black-box adversarial attacks.

Reinforcement learning

SIG

HYP

arXiv cs.AI·May 19

MR-SLAM: Immersive Spatial Supervision for Multi-Robot Mapping via Mixed Reality

MR-SLAM is a mixed reality system using Meta Quest 3 to teleoperate three TurtleBot3 robots in collaborative SLAM. The operator views the real world in passthrough with spatially anchored information panels. Three SLAM Toolbox instances merge occupancy grids in real time via ROS 2, achieving 94.7% cross-instance consistency and 8.83 Hz scan rate.

Robotics Multi-agent Infrastructure

SIG

HYP