Page 26 of 192

AllHigh signalRecent

7679 articles

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

Theoretical analysis of standard transformers with softmax and low precision, proving they can simulate Turing machines via Chain-of-Thought. Authors construct hardmax transformers with ternary activations, then convert to equivalent softmax without unrealistic parameter magnitudes. Results validated on Sudoku reasoning.

Reasoning Papers Benchmarks

SIG

HYP

arXiv cs.LG·May 19

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

AdaGraph is a graph-native clustering algorithm that overcomes the curse of dimensionality by operating on kNN graph topology instead of Euclidean distances. Tested on 10 synthetic benchmarks (d=10 to 5000) and three scientific domains (genomics, NLP, materials science), it outperforms HDBSCAN, WGCNA, and other methods without requiring k specification.

Benchmarks Papers

SIG

HYP

arXiv cs.CL·May 19

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

Study of 'cognitive poisoning': malicious tools accumulate trust through benign feedback before becoming harmful. TRUST-Bench (1,970 episodes) and VISTA-Guard propose defense via final-action risk scoring from interaction trajectory. Prompt-centric heuristics fail; trajectory-aware scoring achieves 84.2% in-domain performance.

AI Agents AI safety Benchmarks

SIG

HYP

Page 26 of 192

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering

InvDesFlow-AL: active learning-based workflow for inverse design of functional materials

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Enhancing Table Reasoning with Deterministic Table-State Rewards

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

LoopQ: Quantization for Recursive Transformers

Membership Inference Attacks on Discrete Diffusion Language Models

Multilingual jailbreaking of LLMs using low-resource languages

Constrained Code Generation with Discrete Diffusion

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

OpenJarvis: Personal AI, On Personal Devices

Predictive Prefetching for Retrieval-Augmented Generation

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering

Beyond Accuracy: Decomposing the Reasoning Efficiency of LLMs

Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning

The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

DCFold: Efficient Protein Structure Generation with Single Forward Pass

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

Distilling Tabular Foundation Models for Structured Health Data

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

DocReward: A Document Reward Model for Structuring and Stylizing

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?