Page 77 of 149

AllHigh signalRecent

5938 articles

Progressive Generalization Augmentation with Deeply Coupled RND-PPO and Domain-Prioritized Noise Injection for Robust Crop Management Reinforcement Learning

arXiv paper introducing Progressive Generalization Augmentation (PGA) to improve robustness of agricultural RL systems. Coupled RND-PPO architecture + hierarchical noise injection. Results: +8.43% yield, +16.42% nitrogen use efficiency vs BERT-DQN in Florida; 94.4% performance retention under combined perturbations.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification

ConfSleepNet, an evidential framework, resolves inter-view conflicts for sleep stage classification. The method extracts category-related evidence from different modalities and aggregates view-specific opinions via a conflict-aware mechanism. Code available on GitHub.

Evals Reasoning

SIG

HYP

arXiv cs.AI·May 19

MusicSynth: An Automated Pipeline for Generating Violin Fingerboard Animations from Sheet Music Using Optical Music Recognition

MusicSynth is an open-source web tool that automatically converts violin sheet music (photo or file) into animated videos showing finger positioning on the fingerboard. The system combines optical music recognition (OMR), MusicXML parsing, and video rendering. Tested on 110 scores: 91.2% note recognition accuracy on printed music, 99.1% finger position accuracy on digital files.

Vision Code generation Open source

SIG

HYP

arXiv cs.AI·May 19

Task-Level AI Readiness Assessment for Business Process Management:The T-IPO Model and LARA Matrix in Financial-Services IT Operations

arXiv paper introducing T-IPO and LARA, tools to assess LLM agent readiness for business tasks. LARA is a 5-dimension rubric scoring tasks into 4 levels (L1-L4), with 1.5× weight on compliance sensitivity. Validated on 127 tasks (κ=0.80), replicated across 3 institutions (κ=0.73). Auto-completion decays from 95% (L1) to 40% (L3).

AI Agents Evals Papers

SIG

HYP

arXiv cs.AI·May 19

ANVIL: Analogies and Videos for Lecturers

ANVIL is a multimodal generative system automating production of analogy-based instructional animations for computer science. Given a concept definition, it generates textual analogies, compiles them into structured visual screenplays, and produces executable manim code. Evaluation includes teacher studies and user adoption assessment.

Video generation Code generation Evals

SIG

HYP

arXiv cs.CL·May 19

Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs

Comparative study of human judgments and 4 LLMs predictions on presupposition projection in conditionals. 120 participants evaluated in parallel with models. Humans integrate probabilistic and pragmatic cues; LLMs show variable alignment. Models matching human ratings lack coherent pragmatic reasoning.

Benchmarks Reasoning Papers

SIG

HYP

arXiv cs.AI·May 19

CooT: Learning to Coordinate In-Context with Coordination Transformers

CooT is a multi-agent framework using in-context learning for real-time adaptation to unfamiliar partners. Evaluated on Overcooked and Google Research Football, it outperforms population-based methods, gradient-based fine-tuning, and Meta-RL baselines without parameter updates.

Multi-agent AI Agents Reasoning

SIG

HYP

arXiv cs.AI·May 19

A Practical Noise2Noise Denoising Pipeline for High-Throughput Raman Spectroscopy

Noise2Noise denoising pipeline for high-throughput Raman spectroscopy using 1D convolutional autoencoder. Trained on repeated short acquisitions (5 ms), no external reference required. Evaluated on mineral sample: RMSE, SNR, SSIM and K-means classification. Preserves chemical coherence while accelerating acquisition.

Papers Benchmarks Code generation

SIG

HYP

arXiv cs.AI·May 19

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

DiagEval is a trajectory-conditioned diagnostic evaluation protocol for GUI agents testing LLM-generated interactive software. It reuses failed trajectories to determine whether failures stem from the evaluator or the software itself. On WebDevJudge-Unit and RealDevBench, DiagEval recovers 45.6-62.1% of false negatives and improves accuracy from 69.9% to 78.3% and from 65.0% to 81.6%.

AI Agents Evals Code generation

SIG

HYP

arXiv cs.AI·May 19

AI of the People, by the People, for the People: A Social Choice Approach to Collective Control of Artificial Intelligence

Theoretical framework grounded in social choice theory to incorporate collective control throughout AI development, from data collection to alignment. Proposes axiomatic criteria for evaluating democratic control mechanisms across multiple stages of the ML pipeline.

Alignment AI safety Regulation

SIG

HYP

arXiv cs.AI·May 19

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

VISAFF is a framework for Emotion Recognition in Conversation (ERC) using vision-language models. It combines two stages: speaker-centered affective grounding and reliability-guided affective complementation. The tuning-free approach leverages frozen VLMs' reasoning capabilities, integrating visual, textual, and acoustic signals to improve accuracy without expensive fine-tuning.

Vision Multi-agent Papers

SIG

HYP

arXiv cs.AI·May 19

Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning

QCEA reformulates medical entity alignment as a query-conditioned correspondence problem, integrating semantic encoding and graph-based representation learning. Evaluated on TCM-WM knowledge graphs (SymMap), the model improves Hit@K and MRR metrics, and demonstrates gains in RAG for evidence retrieval and answer accuracy.

RAG Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

AI4BayesCode translates natural-language Bayesian model descriptions into validated, modular MCMC samplers. The system decomposes models into sampling blocks mapped to built-in components, with pre- and post-generation validation. A novel recursively stateful architecture enables coherent composition of independently developed sampling components.

Code generation AI Agents Reasoning

SIG

HYP

arXiv cs.AI·May 19

Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

Automated LLM-based pipeline to generate and tag knowledge components (KCs) for open-ended programming problems. KCGen-KT framework leverages LLM-generated KCs for knowledge tracing. Evaluation on two real-world student code submission datasets shows it outperforms existing KT methods and human-written KCs on future response prediction.

Llama Code generation Evals

SIG

HYP

arXiv cs.CL·May 19

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint introduces a framework for LLM fingerprinting via parameter generation. Instead of fine-tuning each model separately, a specialized generator maps textual descriptions to low-rank parameter increments in a single forward pass, eliminating retraining costs.

Prompt engineering Fine-tuning AI safety

SIG

HYP

arXiv cs.AI·May 19

Learning Lifted Action Models from Traces with Minimal Information About Actions and States

Learning lifted STRIPS+ action models from partial traces with minimal observability assumptions. Authors relax prior work by allowing partial observability of both actions and states. Three cases formalized: no state observability, full observability of selected predicates, local observability of predicates. Completeness results and experiments provided.

Reasoning Papers

SIG

HYP

arXiv cs.AI·May 19

From Reactive to Proactive: A Multi-Regulatory Empirical Analysis of 480 AI Incidents and a Data-Driven Governance Compliance Framework

Analysis of 480 real-world AI incidents from AIID against EU AI Act, NIST AI Risk Management Framework, and GDPR post-deployment provisions. Reveals substantial governance gaps in post-deployment accountability. Proposes Proactive AI Governance Compliance Framework (PAGCF), a four-phase lifecycle methodology shifting from reactive incident response to pre-deployment compliance assurance.

Regulation AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

Position paper arguing that a three-layer probabilistic architecture (semantic intent/policy compliance, environmental validity, dynamical feasibility) is structurally required for safe LLM agent deployment. Each layer must independently certify one safety dimension via composable probabilistic guarantees.

AI Agents AI safety Alignment

SIG

HYP

arXiv cs.CL·May 19

MA$^{2}$P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion

MA²P is a multi-agent autonomous framework for complex persuasion. It coordinates perception management, mental-state inference, strategy execution, and performance evaluation. A meta-cognitive configurator selects domain-appropriate meta-strategies from a knowledge base to improve generalization and persuasion success rates.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·May 19

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

GCE-MIL improves multiple instance learning for whole-slide image analysis by directly optimizing evidence quality (sufficiency, necessity, recoverability) instead of relying on attention weights. Across 81 configurations (9 backbones, 9 datasets), Macro-F1 gains +0.024 and C-index +0.014, with 5× faster inference.

Papers Benchmarks Vision

SIG

HYP

arXiv cs.AI·May 19

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Vision-OPD introduces regional-to-global self-distillation to improve fine-grained visual understanding in MLLMs. The framework transfers the model's privileged perception on evidence-centered crops to its full-image policy via KL divergence minimization between token distributions. Competitive results on fine-grained visual understanding benchmarks without external models or ground-truth labels.

Vision Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.CL·May 19

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

Multilingual OCR-aware fine-tuning framework for MLLMs combining synthetic OCR-to-translation data generation, LoRA-based SFT, and structured visual chain-of-thought reasoning. Significantly improves extraction of small, blurred, occluded text on receipts, menus, documents under degraded visual conditions. Outperforms GPT-5 and Gemini on OCR grounding and hallucination reduction.

Vision Reasoning Fine-tuning

SIG

HYP

arXiv cs.AI·May 19

Flowing with Confidence

Flow Matching with Confidence (FMwC) adds per-sample confidence scores to generative models at standard sampling cost. By injecting input-dependent multiplicative noise and propagating variance through the ODE, the method enables filtering, trajectory editing, and adaptive stepping. The confidence score correlates with the divergence magnitude of the learned velocity field.

Reasoning Evals

SIG

HYP

arXiv cs.AI·May 19

ChartDesign: Towards LLM Designer of Data Visualization

ChartDesign fine-tunes LLMs (Phi3, Qwen3, InternVL2.5) via LoRA to automatically generate chart design attributes from tabular data. Trained on curated corpus (PewResearch, CharXiV), the system achieves 84% accuracy on held-out test set vs 53% baseline, generalizing to unseen domains.

Fine-tuning Vision Benchmarks

SIG

HYP

arXiv cs.AI·May 19

A Distributional View for Visual Mechanistic Interpretability: KL-Minimal Soft-Constraint Principle

Theoretical paper on mechanistic interpretability of vision models. Proposes a distributional framework using KL-minimal optimization to interpret internal feature activations, addressing biases in heuristic methods (top-K retrieval, regularized optimization). Implementation via energy-guided diffusion posterior sampling, validated on DINOv3.

Vision Evals Papers

SIG

HYP

arXiv cs.AI·May 19

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

An agentic framework uses an LLM to assist users in real-time re-optimization of OR models. The LLM translates requests into structured model modifications, selects re-optimization techniques, and returns implementable solutions. Tested on supply chain and university exam scheduling.

AI Agents Reasoning RAG

SIG

HYP

arXiv cs.AI·May 19

AI Slop or AI-enhancement? Student perceptions of AI-generated media for an English for Academic Purposes course

Implementation study of Google Notebook LM in an English for Academic Purposes course (106 students, Hong Kong). Generated videos, podcasts, and infographics via RAG. Students rated visual and multimodal content highly; video preference correlated positively with academic performance. High cognitive load negatively associated with grades.

RAG Evals Tools

SIG

HYP

arXiv cs.CL·May 19

LLM-Based Intelligent Notification Composition: From Static Personalization to Context-Aware Persuasive Messaging

Study on using LLMs to compose personalized and persuasive push notifications. Authors define 6 quality dimensions (contextual relevance, clarity, actionability, etc.) and demonstrate +8% to +14.5% CTR gains vs static templates. Proposes architectural framework with budget-aware routing, grounded generation, and online learning.

Prompt engineering RAG Business

SIG

HYP

arXiv cs.CL·May 19

Linguistic Uncertainty and Reply Engagement on X: A Cross-Domain Replication of the Uncertainty-Reply Asymmetry

Study of 2,258 English-language posts (April 2026) shows uncertain posts receive 82% more replies than certain posts. Regression confirms positive association (β=0.126, p=0.011), ~13% higher reply engagement. Replicates asymmetry observed in Arabic, suggesting universal interactional mechanism across languages.

Papers Evals

SIG

HYP

arXiv cs.AI·May 19

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Reversa is a reverse documentation engineering framework converting legacy systems into operational specifications for AI agents. A multi-agent pipeline extracts implicit business rules, synthesizes architecture, and generates traceable specifications with confidence marking. Case study: COBOL-to-Go ATM migration producing 517 claims, 10 identified gaps, and 53 Gherkin scenarios.

AI Agents Multi-agent Code generation

SIG

HYP

arXiv cs.AI·May 19

BESplit: Bias-Compensated Split Federated Learning with Evidential Aggregation

BESplit introduces a Split Federated Learning framework to mitigate non-IID data effects. The method combines Evidential Aggregation for client contribution reweighting, Bias-Compensated Collaboration for representation alignment, and Dual-Teacher Distillation for model synchronization. Experiments on 5 benchmarks show improvements in accuracy and convergence stability.

Alignment Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Train the Trainers -- An Agentic AI Framework for Peer-Based Mental Health Support in Battlefield Environments

Agentic AI framework for peer-based mental health support in military operations. Recovered soldiers trained as peer facilitators supervise specialized AI agents (symptom triage, interventions, documentation) in air-gapped environments. Prototype developed with U.S. Army Health Center. Goal: reduce evacuations, accelerate care, maintain human oversight.

AI Agents Multi-agent AI safety

SIG

HYP

arXiv cs.AI·May 19

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

COOPO is a hybrid offline-online reinforcement learning algorithm that cycles between KL-regularized offline training and online fine-tuning. Periodic returns to offline training eliminate catastrophic forgetting and distribution drift. On D4RL benchmarks, COOPO reduces online interactions while improving final returns compared to state-of-the-art hybrids.

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

Study of trade-offs between estimation accuracy, differential privacy, and communication cost in federated learning. Proposes FedHybrid and FedNewton, improvements over FedAvg and FedSGD with finite-sample MSE upper bounds and minimax lower bounds. Evaluation on logistic regression and neural networks (MNIST, CIFAR-10).

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 19

Learning Quantifiable Visual Explanations Without Ground-Truth

New metric to evaluate XAI methods without ground-truth, based on continuous input perturbation. Measures sufficiency and necessity of attributed information. Also proposes trainable XAI method as adapter on black-box models, generating causal explanations without degrading performance.

Evals AI safety Alignment

SIG

HYP

arXiv cs.AI·May 19

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

SENSE is a generative diffusion-based framework that jointly synthesizes realistic urban satellite imagery and aligned building energy consumption and height maps. Tested on NYC, Boston, Lyon, and Busan, it generates annotated synthetic data using <20% labeled data, improving prediction performance by 10% IoU and reducing error by 3-11% NMBE.

Image generation Code generation Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Theoretical paper proposing optimizers respecting symmetries of modern neural architectures. Introduces equivariant update rules for embeddings, LM heads, SwiGLU MLPs, and MoE routers. Validation on dense and sparse MoE models (Qwen3, Gemma 3, OLMoE, gpt-oss) shows improved validation loss vs AdamW.

Papers Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.AI·May 19

Latent Action Reparameterization for Efficient Agent Inference

LAR (Latent Action Reparameterization) compresses LLM agent action spaces by learning semantic multi-step latent actions. This reduces effective decision horizon and inference costs while preserving expressiveness. Across benchmarks, LAR decreases action tokens and wall-clock inference time without degrading task success rates.

AI Agents Code generation Reasoning

SIG

HYP

arXiv cs.AI·May 19

Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency

SAGE, a self-evolving framework, improves spatial reasoning in VLMs by enforcing logical consistency through geometric and linguistic duality operations. Applied as a lightweight GRPO post-training stage, it corrects inconsistencies under predictable transformations and shows gains on video and spatial reasoning benchmarks.

Vision Reasoning Reinforcement learning

SIG

HYP

arXiv cs.AI·May 19

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Framework for active 3D scene graph generation from RGB cameras only, without depth sensors. Unifies perception and planning around a structured representation. On Replica dataset, achieves F1-score parity with depth-based baselines. Semantic-driven viewpoint selection detects 2× more objects than geometric frontier baseline.

Vision Robotics AI Agents

SIG

HYP