Page 74 of 149

AllHigh signalRecent

5929 articles

Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification

Benchmark of LLMs on legal precedent treatment classification. Expert-annotated dataset of 239 real-world legal citations. Gemini 2.5 Flash achieves 79.1% on high-level classification, GPT-5-mini 67.7% on fine-grained schema. Novel Average Severity Error metric to measure practical impact of misclassifications.

Benchmarks Gemini GPT

SIG

HYP

arXiv cs.AI·May 19

MR-SLAM: Immersive Spatial Supervision for Multi-Robot Mapping via Mixed Reality

MR-SLAM is a mixed reality system using Meta Quest 3 to teleoperate three TurtleBot3 robots in collaborative SLAM. The operator views the real world in passthrough with spatially anchored information panels. Three SLAM Toolbox instances merge occupancy grids in real time via ROS 2, achieving 94.7% cross-instance consistency and 8.83 Hz scan rate.

Robotics Multi-agent Infrastructure

SIG

HYP

arXiv cs.AI·May 19

EAGT: Echocardiography Augmentation for Generalisability and Transferability

Comparative study of 29 data augmentation techniques for 2D echocardiography segmentation on U-Net. Anatomically plausible geometric transformations (affine, shift-scale-rotate, perspective, horizontal flip) improve cross-dataset generalization, while aggressive intensity augmentations degrade it. Pairwise combinations outperform individual augmentations.

Vision Benchmarks Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint introduces a framework for LLM fingerprinting via parameter generation. Instead of fine-tuning each model separately, a specialized generator maps textual descriptions to low-rank parameter increments in a single forward pass, eliminating the computational overhead of existing methods.

Prompt engineering Fine-tuning AI safety

SIG

HYP

arXiv cs.AI·May 19

Attention-Guided Fusion of 1D and 2D CNNs for Robust ECG-Based Biometric Recognition

Hybrid framework combining 1D and 2D CNNs with attention-guided fusion for ECG-based biometric recognition. Evaluation on ECG-ID, MIT-BIH, PTB: 99.56%, 100%, 99.89% accuracy. Multi-session tests (Heartprint, 10 years): 98.54%-99.09% same-session, 53-56% cross-session.

Vision Benchmarks Evals

SIG

HYP

arXiv cs.AI·May 19

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

ReTAMamba is a Mamba-based model for predicting irregular clinical time series. It estimates observation reliability from missingness and elapsed time, integrates multi-resolution information via Chronological Weaving, and uses a budgeted token router. On MIMIC-IV, eICU, and PhysioNet 2012, it improves AUPRC by 7.51%, 7.80%, and 10.15% respectively.

Benchmarks Papers Reasoning

SIG

HYP

arXiv cs.CL·May 19

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

SafeLens introduces a two-tier video moderation architecture (fast-and-slow) to reduce inference costs. The framework filters SafeWatch dataset to 2.4% via influence-guided filtering and augments with Chain-of-Thought traces. It outperforms SafeWatch-8B, OmniGuard-7B, GPT-5.4, and Gemini-3.1-pro on real and AI-generated video benchmarks.

Vision AI safety Reasoning

SIG

HYP

arXiv cs.AI·May 19

PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment

PEIRA is a non-contrastive self-supervised learning method analyzing JEPA dynamics through a regularized linear regressor. It minimizes an explicit objective based on the trace of the optimal regressor, ensuring stable non-collapsed equilibria aligned with canonical correlation subspaces. Competitive results on ImageNet-1K and CIFAR-10.

Papers Benchmarks Embeddings

SIG

HYP

arXiv cs.AI·May 19

Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

arXiv study demonstrates that color features alone (RGB/HSV histograms, statistical moments) achieve 89% accuracy in binary cancer/benign classification in histopathology, excluding morphological information. Authors propose these simple features as lightweight pre-screening tool before complex deep learning models.

Vision Benchmarks Evals

SIG

HYP

arXiv cs.AI·May 19

Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

A label-wise attention model trained on combined ICD-9 and ICD-10 data improves rare medical code prediction by 27% micro F1 (18K rare codes) and macro metrics on frequent codes, despite version mismatch. Version-independent approach to automate clinical coding.

Benchmarks Fine-tuning Evals

SIG

HYP

arXiv cs.AI·May 19

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

Fre-Res introduces adaptive video-token compression for video MLLMs. The framework separates spatial details (high-fidelity anchors) from temporal evolution (residual-frequency tokens via 1D-DCT). A Spatial-Guided Absorber aligns frequency dynamics with visual embeddings. Results: near full-token performance with substantial reduction in token length across short and long-video benchmarks.

Vision Video generation Evals

SIG

HYP

arXiv cs.AI·May 19

Probing for Representation Manifolds in Superposition

A supervised method called Manifold Probe discovers representation manifolds in superposition within neural networks. Tested on Llama 2-7b, it identifies linear manifolds for time and space, and demonstrates causal control by steering model completions about release years of movies and songs.

Llama Reasoning

SIG

HYP

arXiv cs.AI·May 19

Multi-task learning on partially labeled datasets via invariant/equivariant semi-supervised learning

Investigation of invariant and equivariant semi-supervised learning (FixMatch, Dense FixMatch) for multi-task training on partially labeled datasets. Evaluation on Cityscapes and BDD100K for object detection and semantic segmentation. Equivariant approaches outperform supervised baselines, especially with limited labeled samples per task.

Vision Papers

SIG

HYP

arXiv cs.AI·May 19

Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

GA-S2S combines T5-small with a Relational Graph Attention Network (RGAT) for knowledge graph link prediction. The model jointly encodes textual features and full k-hop subgraph topology around the query entity. On CoDEx, GA-S2S outperforms Seq2Seq baselines with 19% relative accuracy gain.

Benchmarks RAG Papers

SIG

HYP

arXiv cs.AI·May 19

StrLoRA: Towards Streaming Continual Visual Instruction Tuning for MLLMs

StrLoRA introduces a streaming continual visual instruction tuning framework for MLLMs. Unlike existing methods restricted to predefined tasks, StrCVIT handles data streams with dynamic, interleaved tasks. StrLoRA employs two-stage expert routing with task-aware selection and token-wise weighting, stabilized via routing-stability regularization.

Multi-agent Fine-tuning Vision

SIG

HYP

arXiv cs.AI·May 19

Estimating Item Difficulty with Large Language Models as Experts

Study evaluating three off-the-shelf LLMs to estimate difficulty of educational items without response data. Across 6 primary-school mathematics domains, Spearman correlations show moderate-to-strong alignment with empirical difficulties. Pairwise comparisons outperform absolute judgements; adding token probabilities and few-shot examples improves results.

Prompt engineering Evals Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Bayesian-Monte Carlo Schedule Updating for Construction Digital Twins: A Probabilistic Framework for Dynamic Project Forecasting

Bayesian-Monte Carlo probabilistic framework for dynamic construction project schedule updating. Models activity durations with lognormal distributions, updates them via Bayesian inference, and propagates uncertainty through Monte Carlo simulation. Demonstrates improved forecasting accuracy over deterministic CPM methods on PSPLIB benchmarks.

Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts

UniAlign is a model-agnostic framework improving robustness of network traffic classification under distribution shifts. It combines domain alignment fine-tuning and stable model ensembling, achieving 2.51% accuracy and 2.71% F1 improvements on three public datasets, requiring only 12.4–53.9% of baseline training time.

Benchmarks Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

FedNL reformulates federated learning as a three-level nested optimization system. Embeds Titans-based linear attention for zero-shot test-time adaptation without additional training. Tested on Non-IID MMLU and long-context benchmarks with constant inference memory.

Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

UVTran: Accurate Hole-Filling Parameterization with Transformers

UVTran, a transformer-based framework, solves N-sided hole filling in CAD by predicting an auxiliary projection surface via cross-attention biased toward nearby control points, voxelizing coordinates, and progressive-resolution training. On benchmark, it improves tolerance-satisfaction rate by 12% over industrial and academic baselines while producing fairer trimmed surfaces.

Papers Reasoning

SIG

HYP

arXiv cs.AI·May 19

Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

Study on automated classification of invalid bug reports and no-code fix generation. Researchers propose a standardized taxonomy and benchmark, testing vanilla LLMs, RAG, and agentic web search. RAG achieves 0.66 weighted F1 for subclassification; agentic web search reaches 68.9% Judge LLM success rate for fix generation.

RAG AI Agents Benchmarks

SIG

HYP

arXiv cs.AI·May 19

CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

CATA introduces a continual machine unlearning method for vision-language models (VLMs). It represents each unlearning request as a task vector and aggregates historical vectors by suppressing conflicting components, ensuring forgetting effectiveness, model fidelity, and persistence against knowledge re-emergence.

Vision AI safety Papers

SIG

HYP

arXiv cs.AI·May 19

Stochastic Penalty-Barrier Methods for Constrained Machine Learning

New SPBM method for constrained optimization in deep learning. Combines penalty methods, barrier methods, and exponential dual averaging to handle non-convexity and non-smoothness. Demonstrates effectiveness on fairness, physics-informed networks, and symbolic knowledge integration with linear overhead up to 10k constraints.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Framework for active 3D scene graph generation from RGB cameras only, without depth sensors. Unifies perception and planning around a structured representation. On Replica dataset, achieves F1-score parity with depth-based baselines. Semantic-driven viewpoint selection detects 2× more objects than geometric frontier baseline.

Vision Robotics AI Agents

SIG

HYP

arXiv cs.AI·May 19

Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency

SAGE, a self-evolving framework, improves spatial reasoning in VLMs by enforcing logical consistency through geometric and linguistic duality operations. Applied as a lightweight GRPO post-training stage, it corrects inconsistencies under predictable transformations and shows gains on video and spatial reasoning benchmarks.

Vision Reasoning Reinforcement learning

SIG

HYP

arXiv cs.AI·May 19

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Theoretical paper proposing optimizers respecting symmetries of modern neural architectures. Introduces equivariant update rules for embeddings, LM heads, SwiGLU MLPs, and MoE routers. Validation on dense and sparse MoE models (Qwen3, Gemma 3, OLMoE, gpt-oss) shows improved validation loss vs AdamW.

Papers Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Visual Sculpting: Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting

Robotic clay sculpting planning method using visually-aligned representation. System models deformable material dynamics capturing textures and lighting, enabling long-horizon planning (>100 actions) without per-goal retraining. Tested on three materials with various end-effectors.

Robotics Vision Reasoning

SIG

HYP

arXiv cs.AI·May 19

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

SENSE is a generative diffusion-based framework that jointly synthesizes realistic urban satellite imagery and aligned building energy consumption and height maps. Tested on NYC, Boston, Lyon, and Busan, it generates annotated synthetic data using <20% labeled data, improving prediction performance by 10% IoU and reducing error by 3-11% NMBE.

Image generation Code generation Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Train the Trainers -- An Agentic AI Framework for Peer-Based Mental Health Support in Battlefield Environments

Agentic AI framework for peer-based mental health support in military operations. Recovered soldiers trained as peer facilitators supervise specialized AI agents (symptom triage, interventions, documentation) in air-gapped environments. Prototype developed with U.S. Army Health Center. Goal: reduce evacuations, accelerate care, maintain human oversight.

AI Agents Multi-agent AI safety

SIG

HYP

arXiv cs.AI·May 19

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

Study of trade-offs between estimation accuracy, differential privacy, and communication cost in federated learning. Proposes FedHybrid and FedNewton, improvements over FedAvg and FedSGD with finite-sample MSE upper bounds and minimax lower bounds. Evaluation on logistic regression and neural networks (MNIST, CIFAR-10).

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

COOPO is a hybrid offline-online reinforcement learning algorithm that cycles between KL-regularized offline training and online fine-tuning. Periodic returns to offline training eliminate catastrophic forgetting and distribution drift. On D4RL benchmarks, COOPO reduces online interactions while improving final returns compared to state-of-the-art hybrids.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

AI Slop or AI-enhancement? Student perceptions of AI-generated media for an English for Academic Purposes course

Implementation study of Google Notebook LM in an English for Academic Purposes course (106 students, Hong Kong). Generated videos, podcasts, and infographics via RAG. Students rated visual and multimodal content highly; video preference correlated positively with academic performance. High cognitive load negatively associated with grades.

RAG Evals Tools

SIG

HYP

arXiv cs.AI·May 19

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Reversa is a reverse documentation engineering framework converting legacy systems into operational specifications for AI agents. A multi-agent pipeline extracts implicit business rules, synthesizes architecture, and generates traceable specifications with confidence marking. Case study: COBOL-to-Go ATM migration producing 517 claims, 10 identified gaps, and 53 Gherkin scenarios.

AI Agents Multi-agent Code generation

SIG

HYP

arXiv cs.AI·May 19

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

An agentic framework uses an LLM to assist users in real-time re-optimization of OR models. The LLM translates requests into structured model modifications, selects re-optimization techniques, and returns implementable solutions. Tested on supply chain and university exam scheduling.

AI Agents Reasoning RAG

SIG

HYP

arXiv cs.AI·May 19

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

Position paper arguing that a three-layer probabilistic architecture (semantic intent/policy compliance, environmental validity, dynamical feasibility) is structurally required for safe LLM agent deployment. Each layer must independently certify one safety dimension via composable probabilistic guarantees.

AI Agents AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

BESplit: Bias-Compensated Split Federated Learning with Evidential Aggregation

BESplit introduces a Split Federated Learning framework to mitigate non-IID data effects. The method combines Evidential Aggregation for client contribution reweighting, Bias-Compensated Collaboration for representation alignment, and Dual-Teacher Distillation for model synchronization. Experiments on 5 benchmarks show improvements in accuracy and convergence stability.

Alignment Benchmarks

SIG

HYP

arXiv cs.AI·May 19

A Distributional View for Visual Mechanistic Interpretability: KL-Minimal Soft-Constraint Principle

Theoretical paper on mechanistic interpretability of vision models. Proposes a distributional framework using KL-minimal optimization to interpret internal feature activations, addressing biases in heuristic methods (top-K retrieval, regularized optimization). Implementation via energy-guided diffusion posterior sampling, validated on DINOv3.

Vision Evals Papers

SIG

HYP

arXiv cs.AI·May 19

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Vision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via KL divergence minimization between token distributions. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.

Vision Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.CL·May 19

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

Multilingual OCR-aware fine-tuning framework for MLLMs combining synthetic OCR-to-translation data generation, LoRA-based SFT, and structured visual chain-of-thought reasoning. Significantly improves extraction of small, blurred, occluded text on receipts, menus, documents under degraded visual conditions. Outperforms GPT-5 and Gemini on OCR grounding and hallucination reduction.

Vision Reasoning Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Learning Lifted Action Models from Traces with Minimal Information About Actions and States

Learning lifted STRIPS+ action models from partial traces with minimal observability assumptions. Authors relax prior work by allowing partial observability of both actions and states. Three cases formalized: no state observability, full observability of selected predicates, local observability of predicates. Completeness results and experiments provided.

Reasoning Papers

SIG

HYP