Page 52 of 144

AllHigh signalRecent

5756 articles

Outils RH et intelligence artificielle : l’Europe repousse les obligations haut risque à décembre 2027

The EU postpones to December 2027 the enforcement of obligations for high-risk AI systems in HR tools. A provisional political agreement on May 7, 2026 regarding the Digital Omnibus AI amends the timeline of regulation 2024/1689.

Regulation AI safety

SIG

HYP

arXiv cs.AI·May 29

Provably Secure Agent Guardrail

New arXiv paper proposing ePCA (Proof-Constrained Action), a formal verification security framework for AI agents. Agents must formalize intentions into first-order logical constraints before executing physical operations, bypassing empirical semantic guardrails. Evaluations show 0% attack success rate and 0% false positive rate across tested scenarios.

AI Agents AI safety Alignment

SIG

HYP

arXiv cs.AI·May 29

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Neuro-symbolic framework for ontology-grounded knowledge graph construction combining open-domain extraction, embedding-based canonicalization, and targeted LLM-based correction of ontology violations. Defers corrections to post-extraction stage to reduce token usage, improve KG consistency, and preserve QA quality for multi-hop reasoning and symbolic operations.

RAG Reasoning Embeddings

SIG

HYP

arXiv cs.AI·May 29

Governing Technical Debt in Agentic AI Systems

Paper defines 'Agentic Technical Debt': accumulated liability when prompts, memory, tool schemas, orchestration graphs, and control policies are patched together faster than validated and standardized. Introduces 'Stochastic Tax': recurring operating cost to keep probabilistic agent behavior within acceptable bounds. Proposes lightweight dashboards and governance controls for visibility.

AI Agents Multi-agent AI safety

SIG

HYP

arXiv cs.CL·May 29

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

eXTC combines structured prompt optimization and reinforcement learning for text classification. The system learns a natural language rulebook first, then distills reasoning from a teacher LLM into a compact model, then expands capabilities via RL. Result: fast inference with local reasoning traces and global modular explanations of learned domain rules.

Prompt engineering Reinforcement learning Reasoning

SIG

HYP

arXiv cs.CL·May 29

Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation

Framework to generate targeted synthetic errors with LLMs aligned to cognitive taxonomy (revised Bloom's). A Generation Agent drafts erroneous solutions, an Examination Agent validates consistency with specified error mode. Tested on TheoremQA, shows generating authentic errors is substantially harder than producing arbitrary wrong answers.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.CL·May 29

Large language models reorganize representational geometry during in-context learning

arXiv paper on representational geometry during in-context learning (ICL) in LLMs. Researchers show ICL performance correlates with task representational structure and successful ICL involves geometric reorganization increasing online separability. LLM behavior follows a prototype-like algorithm.

Reasoning Papers

SIG

HYP

arXiv cs.CL·May 29

A comparative study of transformer-based embeddings for topic coherence

Comparative study of 7 transformer models (MiniLM to LLaMA-2, 22M to 13B parameters) for topic modeling via BERTopic. Finding: model size has negligible impact on topic quality measured by coherence and divergence metrics. Smaller models achieve comparable performance to larger ones.

Embeddings Benchmarks Papers

SIG

HYP

arXiv cs.CL·May 29

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

M2R (Micro-Macro Retrieval) is a retrieve-while-generate framework reducing hallucinations in long-form LLM generation. It combines macro retrieval (external evidence) and micro retrieval (key information from reasoning) to maintain proximity between factual data and outputs. Trained via reinforcement learning with rule-based rewards.

RAG Reinforcement learning

SIG

HYP

arXiv cs.CL·May 29

Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

Defect grading framework for power transmission equipment using MLLM. In-context learning on commercial models, chain-of-thought Q&A generation to reduce manual annotation, then fine-tuning Qwen3-VL-8B via LoRA. SOTA on three grading tasks.

Qwen Vision Fine-tuning

SIG

HYP

arXiv cs.LG·May 29

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

KOFF decomposes LLMs into sparse shared backbones and domain-specific external memory modules. On Llama and Qwen (3B-8B), the framework preserves performance at 12% global sparsity using LoRA adapters and learned KV caches, while pruning without memories degrades sharply.

Llama Qwen Fine-tuning

SIG

HYP

arXiv cs.LG·May 29

Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data

Baymex, a multi-objective evolutionary algorithm, learns discretized Bayesian networks for clinical classification. Parallelized on 16 cores (54× speedup), it optimizes cross-entropy and BIC complexity. On real datasets (RADCURE, SUPPORT), it matches or outperforms decision trees, logistic regression, and random forests while producing interpretable models.

Benchmarks

SIG

HYP

arXiv cs.LG·May 29

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

Q-ALIGN DT aligns conditioned sequence models by ensuring the Q-value of the output policy matches the input return-to-go (RTG). The method uses a Q function for dense guidance and RTG-perturbation fine-tuning. Results: improved controllability on D4RL benchmark and generalization to velocity-tracking tasks where prior methods fail.

Reinforcement learning Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·May 29

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

CosmicFish-HRM is a compact model with a Hierarchical Reasoning Module (HRM) that dynamically allocates computational effort during inference. The model learns when to halt based on input complexity, combining high-level and low-level reasoning cycles with Grouped Query Attention, RoPE, and SwiGLU. Results show non-uniform reasoning behavior adapted to tasks and inputs.

Reasoning Fine-tuning Benchmarks

SIG

HYP

arXiv cs.LG·May 29

Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems

Detection of False Data Injection Attacks on power systems using cycle-space informed detection. Authors propose a topology-aware Cycle-Space Detector (CSD) robust against autoencoder-based attacks that exploit the Jacobian null space, leveraging network topology and Minimum Cycle Basis to enhance detection with optimal generalization error on IEEE 14-, 30-, 57-, 118-bus systems.

AI safety Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 29

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

RACE-Sched, an asynchronous multi-agent framework, solves dynamic scheduling by decoupling real-time execution (symbolic heuristics) from long-horizon reasoning (LLM). A semantic rule repository of validated heuristics improves transferability across problem scales. Outperforms Deep RL and LLM baselines on GEN-Bench, MK-Bench, JMS-Bench.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.LG·May 29

Context Distillation as Latent Memory Management

Context distillation reformulated as latent memory management problem. Each context distilled into independent LoRA adapter forming modular memory bank. Self-Gating mechanism decides whether to activate latent memories. Cache sharing reduces inference overhead.

Fine-tuning Reasoning Infrastructure

SIG

HYP

arXiv cs.AI·May 29

Differentiable Belief-based Opponent Shaping

D-BOS (Differentiable Belief-based Opponent Shaping) is a MARL method that shapes opponents by differentiating through k-step softmax-Bayes belief dynamics. Unlike existing approaches, it treats belief state as the shaping target rather than parameters or policies. Results: outperforms PPO and BBM in hidden-role games, with largest gains in mixed-motive settings.

Multi-agent Reinforcement learning Reasoning

SIG

HYP

arXiv cs.AI·May 29

Mind Your Tone: Does Tone Alter LLM Performance?

Study on prompt tone impact on LLM performance. Tests on ChatGPT-4o, ChatGPT-5-nano, Gemini 2.5 Flash/Lite using 50 base questions and 570 MMLU questions (57 subjects) in 5-7 tone variants. Results: tonal effects are systematic but highly model-dependent, with significant accuracy variations across subjects.

Prompt engineering Benchmarks Evals

SIG

HYP

arXiv cs.CL·May 29

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

Method to forecast conversational derailment (personal attacks) in real-time. Decouples alert triggering from derailment likelihood estimation using forward-looking simulations to assess plausible recovery paths. Reduces false positives without sacrificing forecasting accuracy.

Papers Reasoning AI safety

SIG

HYP

arXiv cs.CL·May 29

Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches

Diagnostic tool for measuring constructs like "entrepreneurial spirit" in Chinese state-owned enterprise speeches. On 80 speeches from SOE leaders, authors test LDA, dictionary scorers, and Qwen3.5:9b. The LLM reaches d=1.09 in paired contrast, but half the effect stems from speaker idiolect. Corpus of 2,190 segments and slogan lexicon released.

Benchmarks Evals Qwen

SIG

HYP

arXiv cs.CL·May 29

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

Study of persona effects on explanations generated by multimodal LLM agents in urban perception. Analysis of 59,808 annotations from 1,200 persona-conditioned agents: captions show strong convergence, justifications display systematic variation tied to socioeconomic and political attributes, perception tags show no significant persona-related differences.

Vision AI Agents Prompt engineering

SIG

HYP

arXiv cs.CL·May 29

Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies

SovSim, a multi-agent simulation framework, evaluates how 11 state-of-the-art LLMs manage shared resources under asymmetric power structures. Finding: introducing an agent with disproportionate power (boss/king) causes 87.3% degradation in survival rate and cooperation breakdowns compared to symmetric settings.

Multi-agent AI Agents Benchmarks

SIG

HYP

arXiv cs.CL·May 29

LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English

LLMBridge is an LLM-based system for end-to-end referential bridging resolution in English. The pipeline combines heuristic pre/post-processing with LLM natural language inference capabilities. Evaluated on ISNotes, BASHI, and GUMBridge, it outperforms previous state-of-the-art systems on all three datasets in both end-to-end and gold anaphor settings.

Papers Benchmarks Reasoning

SIG

HYP

Vercel AI Blog·May 29

Protecting against token theft

Vercel warns of AI inference theft: a single frontier model request costs ~$2, creating high-margin attack opportunities. Rate limits and session-based auth are insufficient; Vercel proposes BotID to verify every AI request individually and prevent tens of thousands in losses.

AI safety Infrastructure Business

SIG

HYP

arXiv cs.CL·May 29

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

Researchers introduce a Behavioral Specification as an interpretive layer to align AI decisions with user preferences. Tested on 14 autobiographical corpora, it improves representational accuracy at ~25x lower context cost than raw corpus while reducing model hedging. Effective on interpretation-required questions; less helpful on recall-based tasks.

Alignment RAG AI Agents

SIG

HYP

arXiv cs.CL·May 29

The Trust Paradox: How CS Researchers Engage LLM Leaderboards

Qualitative study of 8 AI researchers reveals a paradox: they distrust LLM leaderboards yet use them as decision aids. Peer networks dominate model selection. NLP researchers face SOTA pressure absent in HCI/Systems. Universal demand: cost transparency.

Benchmarks Evals

SIG

HYP

arXiv cs.CL·May 29

From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization

Paper introduces Program-of-Thoughts prompting for chart summarization: VLMs generate Python programs to derive valid summary statistics instead of direct text. Proposes chart-to-dictionary auxiliary task. Results match existing methods on semantic and factual metrics.

Prompt engineering Vision Reasoning

SIG

HYP

arXiv cs.CL·May 29

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

GPF-LiveNews is a streaming evaluation protocol to audit how LLMs frame emerging news events for different audiences. Tested on 23 models across 12 monitoring runs, it measures semantic and sentiment variations across 42 identity labels. Results show Policy/Action prompts produce strongest semantic movement, while sentiment variation remains flat across dimensions.

Evals AI safety Alignment

SIG

HYP

arXiv cs.CL·May 29

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Thoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.

Reasoning Reinforcement learning Prompt engineering

SIG

HYP

arXiv cs.CL·May 29

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

Comparative assessment of four Dutch syllabification algorithms (Brandt Corstius, Liang, Trogkanis-Elkan CRF, and a novel deep learning model). The deep learning model combining phonetic and orthographic information achieves 99.65% word accuracy (+0.14% improvement over literature). Data-driven algorithms outperform knowledge-based approaches.

Papers Benchmarks Code generation

SIG

HYP

arXiv cs.CL·May 29

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Comparative study of 9 ASR models (Whisper, Parakeet, Wav2Vec2) on child speech in Dutch. Fine-tuned Whisper-medium achieves 5.54% WER on JASMIN and 70.37% on DART. An utterance-level selection method identifies 42% (JASMIN) and 18.1% (DART) of utterances as correctly pronounced with ≥98.3% precision, reducing manual verification needs.

Benchmarks Voice Evals

SIG

HYP

arXiv cs.CL·May 29

A Modular Architecture for Typologically Controlled Lexicon Generation

Modular framework for generating pronounceable, typologically plausible artificial lexicons. Samples phoneme inventories from PHOIBLE, applies three phonological grammars (deterministic, OT, MaxEnt), and assigns meanings via Swadesh-Leipzig-Jakarta ontology. Evaluation on character n-gram perplexity and KL divergence: probabilistic grammars outperform baselines on 100-5,000 word forms.

Papers Benchmarks

SIG

HYP

arXiv cs.CL·May 29

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

Method to create linear probes detecting concepts in LLM embeddings. Authors define a process: concept delineation via contrastive datasets, layer-wise probe training, tracking across large contexts. Tested on 4 concepts and 3 different LLMs. Goal: scalable monitoring of new models.

Embeddings Evals

SIG

HYP

arXiv cs.LG·May 29

Ensemble Score Filtering for Real-Data Energy Consumption Forecast Correction

Energy consumption forecast correction method combining a pretrained spatio-temporal model with Ensemble Score Filter (EnSF). EnSF uses score-based diffusion models to assimilate partial and noisy observations. Real-data experiments show EnSF outperforms Ensemble Kalman Filter under nonlinear observation settings.

Benchmarks Papers Reasoning

SIG

HYP

arXiv cs.LG·May 29

Moment Matching Q-Learning

MoMa QL leverages maximum mean discrepancy (MMD) to accelerate inference of score-based and flow-based generative models in RL. The method guarantees distribution-level convergence and shows superior performance in offline-to-online RL tasks on D4RL benchmarks.

Reinforcement learning Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·May 29

Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization

Active tether-net system for space debris capture using Graph Neural Network (GNN) to jointly optimize net morphology, thruster masses of maneuverable units, and controller aiming points. GNN reduces mixed combinatorial nonlinear programming (MCNLP) to nonlinear programming (NLP) solved via Particle Swarm Optimization with gradient-based refinement, achieving faster convergence than direct MCNLP solving.

Papers Reasoning

SIG

HYP

arXiv cs.LG·May 29

Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions

COAST is a causal-intelligence approach for designing constrained interventions that induce state transitions. The system learns context-specific causal graphs, attributes distributional shifts to mechanism-level causal drivers, and uses multi-objective optimization balancing transition efficacy, intervention complexity, and target-state stability. Validated on synthetic benchmarks and real biological datasets.

Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·May 29

LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

LoRe is a training-free inference-time wrapper optimizing diffusion-based neural solvers for combinatorial optimization. It enforces per-step interaction-evaluation budgeting, dynamically routing computation to high-conflict/high-uncertainty interactions. On MIS and TSP, LoRe achieves ×8 speedup, ×12 peak-memory reduction (MIS) and ×15 speedup, ×44 memory reduction (TSP n=1000).

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.LG·May 29

Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning

BrainSimSiam, a lightweight self-supervised learning framework, learns robust representations from fMRI data without labels. Using positive-only pairs, it generalizes across multiple tasks (classification, regression) and outperforms supervised baselines, reducing computational requirements for foundation models in neuroimaging.

Benchmarks

SIG

HYP