Archives

June 2026

485 articles

arXiv cs.AI·

Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems

Academic paper proposing product-aware autoencoders for anomaly detection in multi-product cyber-physical systems. Traditional global models create blind spots where attacks can evade detection. Tests on Tennessee Eastman Process benchmark: product-aware model achieves 100% detection accuracy versus 22.2% for global baseline in attack scenarios.

BenchmarksAI safetyEvals
SIG
72
HYP
15
arXiv cs.CL·

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

arXiv study on LLM adaptation limits for annotation tasks. Toxicity detection experiments across diverse datasets show 66% of zero-shot errors resist correction via prompting (rescue rate 34.8%). Models follow misaligned definitions while maintaining confidence. Definition-Specific Familiarity (DSF) metric correlates with performance (r=+0.41), outperforming memorization metrics.

Prompt engineeringEvalsBenchmarks
SIG
78
HYP
15
arXiv cs.CL·

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

SPADER is an RL framework for tool-augmented LLM agents in Multi-Answer QA. It introduces Step-wise Peer Advantage (SPA) for fine-grained credit assignment over long trajectories, and a diversity-aware exploration reward promoting rare entity discovery. Evaluated on QAMPARI, Mintaka, WebQSP, QUEST: improves recall and F1 vs prompting and supervised RL baselines.

AI AgentsReinforcement learningReasoning
SIG
78
HYP
18
arXiv cs.AI·

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

TIGER is an inference-time framework to mitigate hallucinations in multimodal generation. It independently extracts an observation graph from input and a claim graph from output, then assigns risk scores to claims based on support and conflict. The model repairs high-risk claims while keeping the backbone frozen. Convergence analysis shows geometric risk reduction to an explicit asymptotic bound.

ReasoningVisionPapers
SIG
78
HYP
22
arXiv cs.AI·

Closed-Loop Neural Activation Control in Vision-Language-Action Models

CTRL-STEER introduces a closed-loop control framework for Vision-Language-Action (VLA) models. Instead of fixed steering coefficients, it adaptively adjusts intervention strength over time using PID or reinforcement learning controllers. Experiments on OpenVLA with LIBERO task suites demonstrate improved concept regulation stability and better steering-task success trade-offs without retraining the base model.

VisionAI AgentsReinforcement learning
SIG
72
HYP
18
arXiv cs.AI·

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Survey paper proposing Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing. Maps classical computer architecture concepts to LLM systems (cache management, context, agents). Introduces three design laws: Semantic Locality Law, Context Budget Law, Agent Speedup Law. Distinguishes probabilistic execution plane from deterministic control plane.

AI AgentsMulti-agentReasoning
SIG
72
HYP
25
arXiv cs.CL·

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Study on preventing catastrophic forgetting during continual pretraining of multilingual language models. Authors propose five parameter alignment strategies (layer freezing, regularization, post-hoc reversion, model merging) tested across 32 languages and four evaluation axes. Parameter alignment substantially reduces forgetting while maintaining language acquisition.

Fine-tuningPapersBenchmarks
SIG
78
HYP
15
arXiv cs.AI·

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Decoder-only models hit an information-theoretic limit in deterministic state-tracking tasks beyond ~25 steps. An Attention Bottleneck Theorem bounds capacity to O(H·log(L/H)·√dh). Across 12 models and 8 domains (SWE-Bench, WebArena, SQL), tool delegation achieves 86-94% vs 24-42% for pure neural reasoning. Fine-tuning improves <5%, confirming an architectural ceiling.

ReasoningAI AgentsBenchmarks
SIG
78
HYP
25
arXiv cs.AI·

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

TAPS introduces a target-aware prefix selection method for diffusion-drafted speculative decoding. By converting diffusion marginals into path-conditioned acceptance estimates, TAPS selects a compact prefix-closed subtree under fixed verification budget. Results: 7.9x lossless speedup vs vanilla autoregressive decoding, 1.36x and 1.74x over DFlash and DDTree.

Code generationReasoningBenchmarks
SIG
78
HYP
15
arXiv cs.AI·

Threshold-Based Exclusive Batching for LLM Inference

arXiv paper on LLM inference batching optimization. Authors demonstrate mixed batching (MB) is suboptimal on bandwidth-constrained GPUs: exclusive batching (EB) achieves 41.9% higher throughput on RTX PRO 6000 (1.792 TB/s). They propose EB+, a hybrid scheduler that dynamically switches between EB and MB based on GPU bandwidth, model size, and workload composition, reaching 36.4% gains under non-stationary traffic.

InfrastructureBenchmarksPapers
SIG
78
HYP
15
arXiv cs.LG·

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

RAFT is a two-stage domain fine-tuning method that mitigates catastrophic forgetting. It refines data via self-conditioned rewriting and answer fusion, then applies on-policy distillation where the original model provides soft targets on student-generated trajectories. Across five domains, RAFT improves domain accuracy by 23.2% over standard SFT and recovers 18.2% of degradation on MS-Bench.

Fine-tuningReinforcement learningPapers
SIG
78
HYP
15
arXiv cs.LG·

Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

Novel approach for Major Depressive Disorder detection from EEG without data augmentation. SGC (Score-Guided Classification) uses an unsupervised generative network to model pathological anomalies as prior, fused with deep feature representations. Cross-Channel Spatial Adaptation module handles multi-center channel heterogeneity. Validated on Mumtaz2016 and MODMA datasets.

PapersEvalsVision
SIG
72
HYP
28
arXiv cs.LG·

AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback

Iterative AI workflow optimizes graphite-based anodes through sequential learning and experimental feedback loops. Citrine Platform generates surrogate models and refines manufacturing constraints. Results: fabrication reliability improved from frequent failures to 100% success, cells ≥350 mAh/g increased from 28.4% to 84.8%, capacity retention rose from 42.1% to 97.3%.

Reinforcement learningBenchmarksTools
SIG
75
HYP
15
arXiv cs.LG·

Quantized Reasoning Models Think They Need to Think Longer, but They Do Not

Post-training quantization (PTQ) reduces reasoning model accuracy and increases chain-of-thought length. 52% of failures involve correct intermediate answers not output as final answers. A training-free logit penalty on overthinking markers ("wait", "but", "alternatively") reduces CoT length by 12-23% while preserving accuracy across 5 models (1.5B-32B) and 5 benchmarks.

ReasoningFine-tuningBenchmarks
SIG
78
HYP
15
arXiv cs.AI·

Capability Self-Assessment: Teaching LLMs to Know Their Limits

Modern LLMs systematically overestimate their competence and attempt unsolvable queries. Researchers propose Capability Self-Assessment (CSA), formulated as a policy-learning problem using reinforcement learning, to teach models to recognize their limits. RL significantly outperforms supervised fine-tuning, preserves original capabilities, and generalizes out-of-distribution.

Reinforcement learningAlignmentEvals
SIG
78
HYP
22
arXiv cs.AI·

MindZero: Learning Online Mental Reasoning With Zero Annotations

MindZero is a self-supervised reinforcement learning framework training multimodal LLMs to infer human mental states without annotations. The model is rewarded for generating mental state hypotheses that maximize the likelihood of observed actions. After training, inference becomes fast single-pass and outperforms model-based methods in both accuracy and efficiency.

ReasoningReinforcement learningAI Agents
SIG
72
HYP
25
Reddit r/LocalLLaMA·

I spent months inside verl (an RL post-training framework), forked it, then stopped. Wrote up the internals, the tooling a fork costs, and a nasty NCCL bug.

A researcher who spent months inside verl (ByteDance's RL post-training framework) documents its internals: RLHF loop orchestration, single-controller pattern, data structures (DataProto), and a NCCL bug discovered. Abandoned fork but knowledge shared with the community.

Reinforcement learningAI AgentsOpen source
SIG
65
HYP
15