RSS

arXiv cs.AI

https://arxiv.org/list/cs.AI/recent

arXiv cs.AI·

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

ChatHealthAI aligns structured EHR representations from a pretrained EHR foundation model with a frozen LLM's semantic space via a task-aware resampler. The multimodal framework integrates longitudinal patient representations with refined clinical event descriptions, improving interpretable clinical reasoning while maintaining competitive predictive performance on the EHRSHOT benchmark.

RAGReasoningEvals
SIG
72
HYP
18
arXiv cs.AI·

Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems

Academic paper proposing product-aware autoencoders for anomaly detection in multi-product cyber-physical systems. Traditional global models create blind spots where attacks can evade detection. Tests on Tennessee Eastman Process benchmark: product-aware model achieves 100% detection accuracy versus 22.2% for global baseline in attack scenarios.

BenchmarksAI safetyEvals
SIG
72
HYP
15
arXiv cs.AI·

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

TIGER is an inference-time framework to mitigate hallucinations in multimodal generation. It independently extracts an observation graph from input and a claim graph from output, then assigns risk scores to claims based on support and conflict. The model repairs high-risk claims while keeping the backbone frozen. Convergence analysis shows geometric risk reduction to an explicit asymptotic bound.

ReasoningVisionPapers
SIG
78
HYP
22
arXiv cs.AI·

Closed-Loop Neural Activation Control in Vision-Language-Action Models

CTRL-STEER introduces a closed-loop control framework for Vision-Language-Action (VLA) models. Instead of fixed steering coefficients, it adaptively adjusts intervention strength over time using PID or reinforcement learning controllers. Experiments on OpenVLA with LIBERO task suites demonstrate improved concept regulation stability and better steering-task success trade-offs without retraining the base model.

VisionAI AgentsReinforcement learning
SIG
72
HYP
18
arXiv cs.AI·

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Survey paper proposing Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing. Maps classical computer architecture concepts to LLM systems (cache management, context, agents). Introduces three design laws: Semantic Locality Law, Context Budget Law, Agent Speedup Law. Distinguishes probabilistic execution plane from deterministic control plane.

AI AgentsMulti-agentReasoning
SIG
72
HYP
25
arXiv cs.AI·

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Decoder-only models hit an information-theoretic limit in deterministic state-tracking tasks beyond ~25 steps. An Attention Bottleneck Theorem bounds capacity to O(H·log(L/H)·√dh). Across 12 models and 8 domains (SWE-Bench, WebArena, SQL), tool delegation achieves 86-94% vs 24-42% for pure neural reasoning. Fine-tuning improves <5%, confirming an architectural ceiling.

ReasoningAI AgentsBenchmarks
SIG
78
HYP
25
arXiv cs.AI·

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

TAPS introduces a target-aware prefix selection method for diffusion-drafted speculative decoding. By converting diffusion marginals into path-conditioned acceptance estimates, TAPS selects a compact prefix-closed subtree under fixed verification budget. Results: 7.9x lossless speedup vs vanilla autoregressive decoding, 1.36x and 1.74x over DFlash and DDTree.

Code generationReasoningBenchmarks
SIG
78
HYP
15
arXiv cs.AI·

Threshold-Based Exclusive Batching for LLM Inference

arXiv paper on LLM inference batching optimization. Authors demonstrate mixed batching (MB) is suboptimal on bandwidth-constrained GPUs: exclusive batching (EB) achieves 41.9% higher throughput on RTX PRO 6000 (1.792 TB/s). They propose EB+, a hybrid scheduler that dynamically switches between EB and MB based on GPU bandwidth, model size, and workload composition, reaching 36.4% gains under non-stationary traffic.

InfrastructureBenchmarksPapers
SIG
78
HYP
15
arXiv cs.AI·

MindZero: Learning Online Mental Reasoning With Zero Annotations

MindZero is a self-supervised reinforcement learning framework training multimodal LLMs to infer human mental states without annotations. The model is rewarded for generating mental state hypotheses that maximize the likelihood of observed actions. After training, inference becomes fast single-pass and outperforms model-based methods in both accuracy and efficiency.

ReasoningReinforcement learningAI Agents
SIG
72
HYP
25
arXiv cs.AI·

Capability Self-Assessment: Teaching LLMs to Know Their Limits

Modern LLMs systematically overestimate their competence and attempt unsolvable queries. Researchers propose Capability Self-Assessment (CSA), formulated as a policy-learning problem using reinforcement learning, to teach models to recognize their limits. RL significantly outperforms supervised fine-tuning, preserves original capabilities, and generalizes out-of-distribution.

Reinforcement learningAlignmentEvals
SIG
78
HYP
22
arXiv cs.AI·

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Study on harness self-evolution (prompts, skills, memories, tools) in LLM agents. Analyzes two capabilities: harness-updating (producing useful updates) and harness-benefit (benefiting from them). Findings: harness-updating is capability-agnostic (Qwen3.5-9B matches Claude Opus gains), while harness-benefit is non-monotonic (mid-tier models benefit most).

AI AgentsPrompt engineeringBenchmarks
SIG
75
HYP
15