Page 70 of 148

AllHigh signalRecent
5898 articles
arXiv cs.AI·

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

AdaGraph is a graph-native clustering algorithm that overcomes the curse of dimensionality by operating on kNN topology rather than Euclidean metrics. Without specifying k a priori, it identifies gene modules in genomics (GSE14520, 10k genes), achieves ARI=0.751 on text clustering (20NG-6cat vs HDBSCAN 0.464), and outperforms Silhouette/Davies-Bouldin on 10 benchmarks up to d=5000.

BenchmarksPapers
SIG
72
HYP
28
arXiv cs.AI·

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

Study of adversarial attacks via action removal in self-play reinforcement learning. An attacker selectively removes legal actions from the victim's available set. Across poker games (6 to 5,531 states) and two non-poker domains, learned masking causes more damage than random masking. The attack persists across Q-learning, PPO, NFSP, DQN and shows no recovery under extended masked training.

Reinforcement learningAI safetyBenchmarks
SIG
72
HYP
18
arXiv cs.AI·

MusicSynth: An Automated Pipeline for Generating Violin Fingerboard Animations from Sheet Music Using Optical Music Recognition

MusicSynth is an open-source web tool that automatically converts violin sheet music (photo or file) into animated videos showing finger positioning on the fingerboard. The system combines optical music recognition (OMR), MusicXML parsing, and video rendering. Tested on 110 scores: 91.2% note recognition accuracy on printed music, 99.1% finger position accuracy on digital files.

VisionCode generationOpen source
SIG
72
HYP
25
arXiv cs.AI·

Task-Level AI Readiness Assessment for Business Process Management:The T-IPO Model and LARA Matrix in Financial-Services IT Operations

arXiv paper introducing T-IPO and LARA, tools to assess LLM agent readiness for business tasks. LARA is a 5-dimension rubric scoring tasks into 4 levels (L1-L4), with 1.5× weight on compliance sensitivity. Validated on 127 tasks (κ=0.80), replicated across 3 institutions (κ=0.73). Auto-completion decays from 95% (L1) to 40% (L3).

AI AgentsEvalsPapers
SIG
72
HYP
15
arXiv cs.AI·

AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

AI4BayesCode translates natural-language Bayesian model descriptions into validated, modular MCMC samplers. The system decomposes models into sampling blocks mapped to built-in components, with pre- and post-generation validation. A novel recursively stateful architecture enables coherent composition of independently developed sampling components.

Code generationAI AgentsReasoning
SIG
72
HYP
28
arXiv cs.AI·

From Reactive to Proactive: A Multi-Regulatory Empirical Analysis of 480 AI Incidents and a Data-Driven Governance Compliance Framework

Analysis of 480 real-world AI incidents from AIID against EU AI Act, NIST AI Risk Management Framework, and GDPR post-deployment provisions. Reveals substantial governance gaps in post-deployment accountability. Proposes Proactive AI Governance Compliance Framework (PAGCF), a four-phase lifecycle methodology shifting from reactive incident response to pre-deployment compliance assurance.

RegulationAI safetyAlignment
SIG
72
HYP
18
arXiv cs.CL·

LLM-Based Intelligent Notification Composition: From Static Personalization to Context-Aware Persuasive Messaging

Study on using LLMs to compose personalized and persuasive push notifications. Authors define 6 quality dimensions (contextual relevance, clarity, actionability, etc.) and demonstrate +8% to +14.5% CTR gains vs static templates. Proposes architectural framework with budget-aware routing, grounded generation, and online learning.

Prompt engineeringRAGBusiness
SIG
72
HYP
28
arXiv cs.AI·

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

AMR-SD introduces asymmetric meta-reflective self-distillation to improve token-level credit assignment in LLM reinforcement learning. The method compresses diagnostic signals into self-generated Socratic hints and uses Causal Information Gain with asymmetric ReLU-gated threshold for sparse token-level advantage modulation, preventing late-stage training collapse.

Reinforcement learningReasoningAlignment
SIG
72
HYP
18
arXiv cs.AI·

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

arXiv paper on spatial limitations of MLLMs in multi-agent environments. Models suffer from a "Cartesian Illusion": lack grounded 3D topological understanding. Authors propose an Epistemic Sensory Bottleneck module with Anchor-Based Embodied Spatial Decomposition CoT to improve second-order spatial inference (Theory of Mind). Zero-shot baseline: 42% accuracy.

VisionMulti-agentReasoning
SIG
72
HYP
28
arXiv cs.AI·

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

ConsumerSimBench, a benchmark built from 1,553 Chinese social-media topics and 23,122 reaction criteria, evaluates whether LLMs can reconstruct real consumer reaction patterns. Gemini-3.1-Pro covers only 47.8% of criteria, revealing a major gap between technical performance and consumer intuition. A multi-agent pipeline improves MiMo-V2.5-Pro from 32.9% to 37.6%.

BenchmarksEvalsMulti-agent
SIG
72
HYP
25
arXiv cs.CL·

QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

QQJ is an evaluation framework for generative AI that combines human judgment and LLMs. It uses expert-designed multi-dimensional rubrics and calibrates LLM evaluators on a small high-quality annotation set. Experiments on text and image generation show stronger alignment with human judgment than traditional automatic metrics and unconstrained LLM evaluators.

EvalsLlamaVision
SIG
72
HYP
28
arXiv cs.AI·

LAST-RAG: Literature-Anchored Stochastic Trajectory Retrieval-Augmented Generation for Knowledge-Conditioned Degradation Model Selection

LAST-RAG proposes a method for selecting stochastic degradation models to estimate remaining useful life (RUL). It combines observed trajectories and domain context via retrieval from a local evidence bank, with RCRUS mechanism to prevent premature model elimination. Experiments show outperformance versus statistical and prognostic baselines.

RAGReasoningBenchmarks
SIG
72
HYP
15
arXiv cs.CL·

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

HINT-SD proposes targeted self-distillation for training long-horizon LLM agents. The method uses full-trajectory hindsight to identify failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. On BFCL v3 and AppWorld, it improves over dense per-turn feedback baselines by up to 18.80% while achieving 2.26× lower time per training step.

AI AgentsReinforcement learningReasoning
SIG
72
HYP
18