Archives

May 2026

3147 articles

arXiv cs.AI·

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

Hierarchical framework for compact LLMs in resource-constrained agentic systems. Model distillation + oracle-controller loop monitors protocol validity, projects histories into feasible prompt domain, triggers lightweight fine-tuning under drift. Separates schema learning from semantic adaptation. Evaluated on Multi-Fidelity Bayesian Optimization with improved reliability and cost-efficiency.

AI AgentsFine-tuningPrompt engineering
SIG
72
HYP
18
arXiv cs.CL·

Simorgh at SemEval-2026 task 7: Region-Aware Hybrid Retrieval for Low-Resource Cultural Reasoning in Multilingual Question Answering

Simorgh proposes a region-aware hybrid retrieval approach combining BM25 lexical matching and dense semantic similarity for culturally grounded multilingual QA on BLEnD benchmark (30 languages). Uses quantized Qwen3-14B with logit-based answer selection. Improves cross-lingual stability but reveals performance gaps tied to training data imbalance.

RAGBenchmarksQwen
SIG
72
HYP
18
arXiv cs.LG·

Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility

Comparative study of Liquid Neural Networks (LNN/CfC) vs LSTM across four sequential modalities (N-MNIST, QuickDraw, IAM, PhysioNet Sepsis-3). LNNs model hidden state evolution as continuous differential equations. Results: LNNs outperform LSTM in parameter efficiency and robustness to missing data, especially in clinical environments.

BenchmarksReasoning
SIG
72
HYP
18
arXiv cs.LG·

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

SignGAD introduces a self-designing agentic framework for few-shot graph anomaly detection. Instead of fixed pipelines, it designs task-conditioned detection workflows by selecting suitable graph encodings and detector designs. A guarded refit strategy refines selected workflows under limited supervision, outperforming state-of-the-art methods on real-world datasets.

AI AgentsBenchmarksPapers
SIG
72
HYP
28
arXiv cs.CL·

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

TRACES is a proactive safety auditor for multi-turn LLM agents that detects drift toward unsafe behavior from hidden representations of an observer LLM. Trained with weak trajectory-level supervision, it produces dense prefix-level risk estimates, improving full-trajectory safety prediction and proactive risk discrimination across multiple agent safety benchmarks.

AI AgentsAI safetyReasoning
SIG
78
HYP
22
arXiv cs.LG·

Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective

Theoretical paper decomposing the pre-softmax attention matrix QK^T into symmetric and skew-symmetric components. The symmetric part governs the energy landscape, the skew-symmetric part drives circulation. Authors propose Hopfield-style stability measures to quantify fidelity-diversity trade-offs in generation and a controllable mechanism to modulate this trade-off.

ReasoningPapersVision
SIG
72
HYP
15
arXiv cs.LG·

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

Personalized Observation Normalization (PON) method for federated reinforcement learning in heterogeneous environments. Each agent locally normalizes state inputs using continuously updated running mean and variance, preventing imbalanced parameter aggregation issues. Experiments on heterogeneous MuJoCo tasks demonstrate accelerated training and superior performance versus baselines.

Reinforcement learningMulti-agent
SIG
72
HYP
18
arXiv cs.LG·

High-Fidelity Industrial Crash Dynamics Prediction via Geometry-Aware Operator Learning with Memory-Efficient Low-Rank Attention

GeoTransolver, a geometry-aware operator learning framework, accurately predicts industrial-scale automotive crash dynamics. On bumper beam and full-vehicle crash datasets, it captures plastic deformations and acceleration profiles. A FLARE-based modification reduces memory overhead by 2x while improving accuracy for high-frequency transients.

PapersBenchmarksReasoning
SIG
72
HYP
25
arXiv cs.LG·

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

Theoretical paper positioning Metric-Aware Principal Component Analysis (MAPCA) within geometric deep learning framework. MAPCA parameterises PCA by a positive-definite metric matrix, with solutions equivariant under the orthogonal group preserving the metric. A uniqueness theorem characterises Invariant PCA as the unique linear data-derived metric equivariant under arbitrary diagonal rescaling.

PapersReasoning
SIG
72
HYP
15
arXiv cs.LG·

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

E³-Agent is an executable and evolving agent for edge generative inference resource management. It pairs a fast-path router (millisecond dispatch) with a slow-path LLM meta-controller driven by events, learning online from execution feedback. Evaluated in simulation, it reduces latency by 65-73% versus static baselines across dynamic scenarios (semantic shifts, device churn, hidden drift).

AI AgentsReasoningInfrastructure
SIG
72
HYP
28
arXiv cs.LG·

Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation

Bayesian framework for validating deployment of learned autonomous landing controllers. Uses Bayesian inference to quantify uncertainty about true policy capability beyond empirical metrics (reward, success rate). Experiments with PPO and SAC show empirical optimization overconfidence, while Bayesian inference better calibrates deployment readiness assessment.

Reinforcement learningAI safetyRobotics
SIG
72
HYP
15
arXiv cs.CL·

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Modality-Aware Policy Optimization (MAPO) addresses late-stage modality collapse in audio-text models during RL fine-tuning. The method concentrates policy gradients on modality-critical tokens via a modality relevance mask and adds an attention penalty to sustain cross-modal grounding. MAPO achieves SOTA on several complex audio reasoning benchmarks.

Reinforcement learningReasoningAlignment
SIG
78
HYP
25
arXiv cs.CL·

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

Study on gender preservation in English-to-Hindi translation. Benchmark of 37,345 instances shows GPT-4o-mini and Sarvam frequently erase gender via ergative constructions. Two rerankers (SAR and PAR) improve gender recoverability: PAR increases accuracy from 11-16% to 49-54%, but reduces fluency (4.36→3.37). Reveals preservation-fluency tradeoff.

BenchmarksVisionAlignment
SIG
78
HYP
15
arXiv cs.LG·

IGADA-IoT: IoT Sensor Energy Optimization in Wireless Sensor Networks Driven by Automatic Data Augmentation

IGADA-IoT proposes an information gap-guided automatic data augmentation framework for IoT sensor energy optimization in wireless sensor networks. The method employs hierarchical multi-generator collaboration scheduling (HMGCS) and joint information gap-model performance evaluation (IGMP-EC). Results: +7.27% average accuracy improvement, +8.67% vs advanced augmentation methods.

EvalsFine-tuning
SIG
65
HYP
25
arXiv cs.CL·

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

BioELX is a two-stage cross-lingual biomedical entity linking system requiring no annotated training data. It enriches SapBERT with Wikidata-derived multilingual aliases and uses an LLM for context-aware disambiguation. On five benchmarks, it achieves +19.2 Recall@1 on XL-BEL, with major gains for low-resource languages (Turkish +21.6, Korean +22.1, Thai +30.8).

BenchmarksPapersRAG
SIG
78
HYP
15
arXiv cs.AI·

Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems

OADA is a governance framework for high-stakes AI systems that translates fairness metric instability, threshold sensitivity, and operational uncertainty into deployment-oriented assurance decisions. Tested on facial recognition and healthcare, it introduces Deployment Assurance Scores, escalation states, and Threshold Stability Zones to actively govern deployment readiness rather than rely on post-hoc auditing.

AI safetyAlignmentEvals
SIG
62
HYP
28
arXiv cs.AI·

A Policy-Driven Runtime Layer for Agentic LLM Serving

Proposes intermediate runtime layer between agent framework and LLM serving engine. Introduces four primitives (observe, score, predict, act) to implement agent-aware policies (KV caching, batch shaping, speculation, fairness, safety). CacheSage, instantiated for cross-session caching, achieves +13 to +37 pp cache hit-rate lift, 12–29% lower TTFT, 6–14% higher throughput on five real multi-agent workloads.

AI AgentsMulti-agentInfrastructure
SIG
78
HYP
25
arXiv cs.LG·

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

SDR (Supervised Distributional Reduction) combines optimal transport and dependence maximization to learn target-aware representations. The algorithm extends the Fused Gromov-Wasserstein objective with an explicit dependence term, producing compact embeddings that capture both geometric structure and predictive signal. Application to Gaussian Process modelling with adaptive kernels.

Papers
SIG
72
HYP
15
arXiv cs.AI·

SkillGrad: Optimizing Agent Skills Like Gradient Descent

SkillGrad optimizes LLM agent skills using a gradient-descent-inspired framework. Task executions provide trajectory-level loss signals, automatic diagnostics generate text-based gradients, and a momentum agent accumulates recurring patterns. Evaluated on SpreadsheetBench and WikiTableQuestions, SkillGrad outperforms training-based baselines by 6.7 percentage points on average.

AI AgentsReinforcement learningPrompt engineering
SIG
78
HYP
25
arXiv cs.AI·

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

PEAM is an embodied agent memory framework in Minecraft that internalizes experience as parameters rather than inference-time retrieval. It pairs a slow LLM for reasoning with a fast parametric module (Mixture-of-Experts LoRA) learning via behavioral cloning and contrastive objectives. Failures are treated as training signals to learn corrected actions.

AI AgentsReinforcement learningFine-tuning
SIG
78
HYP
25
arXiv cs.CL·

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Spoken Language Models (SLMs) for speech synthesis in low-resource languages face a trade-off: synthetic data improves phonetic accuracy but suppresses prosodic variability (Synthetic Erosion). Authors propose two self-alignment frameworks (DGSA and TDSC) to recover expressivity, outperforming ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.

VoicePapersReasoning
SIG
72
HYP
28
arXiv cs.AI·

C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning

C-MIG introduces a multi-view information gain-based RAG framework for clinical diagnosis reasoning. It replaces exact-match binary rewards with information gain estimation from two views (retrieved documents and document refinement) to better supervise LLM reasoning. Experiments on four medical benchmarks show improvements over RAG-RL baselines in both in-domain and out-of-domain settings.

RAGReinforcement learningReasoning
SIG
75
HYP
15
arXiv cs.CL·

Chain-based Adaptive Reconfiguration Over Lattices for Hallucination Reduction

CAROL is a probabilistic framework for test-time hallucination reduction in LLMs. It defines semantic uncertainty based on consistency between generated responses and trusted context, formulating mitigation as a Markov chain accept-reject process with convergence guarantees. Results on QA and multi-agent reasoning benchmarks show significant hallucination reduction.

ReasoningAI safetyAlignment
SIG
75
HYP
15
arXiv cs.LG·

Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms

New framework enabling user collectives to correct algorithmic disparities without platform intervention. Test-Time Collective Action (TTCA) uses universal perturbations derived from a proxy model to improve fairness without training access. Validation on CIFAR-10, CIFAR-100, and FairFace demonstrates closure of subgroup accuracy gaps and improved worst-group accuracy.

AI safetyAlignmentEvals
SIG
72
HYP
18
arXiv cs.AI·

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

EAPO is an adaptive policy optimization method for training reasoning models in open-ended QA. It dynamically adjusts positive/negative sample weights based on current-to-initial entropy ratio to preserve exploration and stability. Tests on two medical QA datasets show improvements in diversity and stability versus fixed-weight baselines.

Reinforcement learningReasoningEvals
SIG
72
HYP
18
arXiv cs.CL·

Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

Analysis of 703,975 youth crisis SMS conversations (Kids Help Phone, 2018-2023). Introduces Keyphrase Generative Representation (KGR), a constrained LLM generating context-specific keyphrases. Taxonomy expanded from 19 to 39 labels with 0.96 accuracy. KGR identifies 81% accurate keyphrases and improves topic-retrieval workflow (+0.45 accuracy vs manual process).

LlamaPrompt engineeringRAG
SIG
72
HYP
18