Archives

May 2026

3147 articles

arXiv cs.LG·

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

Survey paper at the intersection of human-body communication (HBC) and federated learning for wearable sensor networks. Proposes taxonomy of FL deployments (intra-body, body-hub, cross-user, clinical-cloud) and introduces BODYFED-HBC reference architecture with scheduling algorithm and reproducible simulation combining public datasets with empirical HBC signal-loss models.

Benchmarks
SIG
72
HYP
15
arXiv cs.AI·

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

LC-ERD is a self-alignment framework for LLMs that mines latent logical structures via consistency-regulated reward decomposition. Addresses three challenges: label noise from mimetic bias, coarse-grained supervision, and distributional collapse. Uses Variational Logic Potential and multi-agent value decomposition based on IGM principle.

ReasoningReinforcement learningAlignment
SIG
72
HYP
28
arXiv cs.CL·

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

Researchers apply Direct Preference Optimization (DPO) to improve English-Mandarin code-switching transcription in Audio LLMs. Three failure modes identified: language omission, translation-instead-of-transcription, hallucination. Training on 100K pairs (570 hours) reduces MER up to 89.6% (in-distribution) and 20.0% (out-of-distribution).

Reinforcement learningAlignmentVoice
SIG
78
HYP
15
arXiv cs.CL·

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

SLAP is a batch-aware data selection framework for instruction tuning that evaluates learnability at batch composition level rather than individual samples. Using stratified sampling and relative distance optimization with Hessian-approximated gradients, it matches full dataset performance with 20-40% less training data across LLaMA, ChatGLM, and diverse tasks (dialogue, translation, QA).

Fine-tuningLlamaBenchmarks
SIG
72
HYP
28
arXiv cs.CL·

Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

Systematic review of 139 studies on information fusion for document classification. Meta-analysis shows multimodal fusion improves accuracy by +5.28 percentage points (p=0.0016) and multiview fusion by +4.67% accuracy. Critical finding: only 11.8% of multimodal and 23.3% of multiview studies use statistical validation, undermining reproducibility.

BenchmarksEvalsPapers
SIG
78
HYP
15
arXiv cs.LG·

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

Method to identify attention-head circuits in pretrained transformers using spectral signal (time-integrated participation ratio), task-pattern filtering, and group ablation against matched-random control. Validated across 51M to 7B parameters, two architectures, four pretraining pipelines. Finding: 2-6 head induction circuit causally necessary in all models tested (94-100% drop after ablation).

PapersReasoningEvals
SIG
78
HYP
15
arXiv cs.AI·

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Comprehensive survey on trustworthy agentic AI systems (LLMs augmented with planning, tool use, memory). Examines safety, robustness, privacy, and system security. Proposes unified metrics, benchmarks, and stage-targeted mitigation strategies across agent workflows. Identifies open challenges: self-evolving agents, runtime verification, privacy-preserving personalization.

AI AgentsAI safetyBenchmarks
SIG
75
HYP
20
arXiv cs.LG·

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Agent-ToM is a learning-to-monitor framework using Theory-of-Mind reasoning to detect covert malicious behavior in autonomous LLM agents. It infers agent beliefs, intent hypotheses, and behavioral deviations from task-consistent baselines. Evaluated on SHADE-Arena and CUA-SHADE-Arena benchmarks, it outperforms ensemble monitoring baselines with a two-call reasoning pipeline.

AI AgentsAI safetyReasoning
SIG
72
HYP
28
arXiv cs.LG·

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

New arXiv paper proposing DINOSaur, a training-free method for continual anomaly detection in industrial settings. Combines frozen DINOv3 backbone, spatially-indexed coreset memory, and neighborhood-restricted anomaly scoring. Achieves zero forgetting, outperforms all baselines across 5 protocols, runs <100ms inference on Jetson Orin Nano with on-device adaptation <30s.

BenchmarksVision
SIG
72
HYP
25
arXiv cs.LG·

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

LLM-AutoSciLab proposes a closed-loop scientific discovery framework coupling hypothesis generation, hypothesis-conditioned experiment selection, and mechanism refinement. Evaluated on ActiveSciBench (57 enzyme-kinetics tasks, 45 gene-regulatory-network tasks), the system achieves 67.6% symbolic accuracy and 2-5x better sample efficiency than competing baselines.

ReasoningAI AgentsBenchmarks
SIG
82
HYP
25
arXiv cs.LG·

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

Riemannian archetypal analysis using data-driven pullback geometry on deformed star distributions. Combines interpretability of classical archetypal analysis with non-linear model expressiveness. Riemannian archetypal mapping (RAM) projects onto manifolds of geodesically convex archetype combinations. Experiments on MNIST demonstrate meaningful geodesics and geometry-aware denoising.

PapersReasoning
SIG
72
HYP
15
arXiv cs.LG·

A lift for input-convex neural network training

Novel training method for input-convex neural networks (ICNNs) using an unconstrained hypernetwork that emits inter-layer weights. Approach inspired by parameter-extension lifts from PDE-constrained inverse problems, circumvents limitations of projected gradient descent and softplus reparametrization. Results on log-concave density estimation and convex-potential normalizing flows show improved convergence.

PapersReasoningReinforcement learning
SIG
72
HYP
15
arXiv cs.LG·

Interdomain Attention: Beyond Token-Level Key-Value Memory

Interdomain Attention merges transformers and state space models via kernel methods: attention features are projected onto basis functions maintained by an SSM, enabling query-conditioned attention over fixed-size state. On FineWeb-Edu (125M–1.3B), outperforms softmax baselines at 1.3B on validation perplexity and commonsense tasks, with length-flat behavior up to 3.5× training context.

ReasoningBenchmarksPapers
SIG
78
HYP
15
arXiv cs.AI·

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

arXiv paper arguing LLMs fail at causal reasoning and long-horizon planning due to lack of world models. Authors introduce Latent Dynamics Inference (LDI) and Flux, a sequential reasoning environment specified in natural language. RL agents with explicit latent state access achieve 79% win rate vs 11% for LLMs, revealing failures in persistent state tracking.

ReasoningReinforcement learningPapers
SIG
72
HYP
35
arXiv cs.AI·

A Dynamical Framework for Cognitive Processes Based on Transformations and Semantic Equivalence

Dynamical framework for modeling cognitive processes via feedback systems. Cognitive states evolve through X_{t+1} = π(F(f(X_t))) where f describes internal transformations, F interpretative mappings, π enforces semantic equivalence. Categorical formulation and stability analysis via fixed-point arguments. Linguistic application: context-dependent interpretation as trajectory toward stable semantic class.

ReasoningPapers
SIG
35
HYP
15
arXiv cs.CL·

Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation

Automated pipeline transforms 4,555 German Federal Court decisions into legal commentaries. Extracts paragraph-level chunks, summarizes reasoning, embeds and clusters keywords. LLMs generate headings and citation-rich sections merged into coherent commentaries. Evaluated on 5 dimensions: topical relevance, citation faithfulness, cluster distinction, logical ordering.

RAGCode generationEvals
SIG
72
HYP
15
arXiv cs.CL·

Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation

Method to improve consistency in automated labeling pipelines for content moderation. Authors propose an AI-driven workflow where an LLM writes detailed per-category constitutions (harassment, hate speech, non-violent crime), then a frontier LLM interprets them to generate golden labels. Result: 57x reduction in cross-model inconsistency vs paragraph definitions.

EvalsAI safetyAlignment
SIG
75
HYP
15
arXiv cs.LG·

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

Heteroscedastic uncertainty-aware PINN framework for flood extent mapping from SAR data. Attention-Gated FNO-UNet with dynamic Warm-Start protocol and aleatoric uncertainty modeling prevents gradient divergence ("Physics Shock"). On Sen1Floods11: +25% relative IoU improvement over deterministic baselines, with calibrated confidence bounds for disaster response.

PapersReasoningEvals
SIG
72
HYP
18
arXiv cs.LG·

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

Verifiable Transformers framework converts task-localized Transformer circuits into solver-checkable formal claims. Extracts circuits and verifies functional equivalence, edge necessity, invariance, and robustness via SMT encoding. Demonstrates direct verification on symbolic tasks and surrogate-mediated verification at GPT-2 scale with SMT-representable operators (Signed L1 BandNorm, sparsemax, LeakyReLU).

ReasoningAI safetyPapers
SIG
72
HYP
18