Page 26 sur 192

ToutHaut signalRécent

7679 articles

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

Analyse théorique de transformers standard avec softmax et précision basse, montrant qu'ils peuvent simuler des machines de Turing via Chain-of-Thought. Les auteurs construisent des transformers hardmax avec activations ternaires, puis les convertissent en softmax équivalents sans magnitudes irréalistes. Résultats validés sur Sudoku.

Raisonnement Papers Benchmarks

SIG

HYP

arXiv cs.LG·19 mai

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

AdaGraph est un algorithme de clustering graph-native qui élimine la malédiction de la dimensionnalité en opérant sur la topologie des graphes kNN plutôt que sur les distances euclidiennes. Testé sur 10 benchmarks synthétiques (d=10 à 5000) et trois domaines scientifiques (génomique, NLP, matériaux), il surpasse HDBSCAN, WGCNA et autres méthodes sans spécifier k a priori.

Benchmarks Papers

SIG

HYP

arXiv cs.CL·19 mai

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

Étude de la « cognitive poisoning » : des outils malveillants qui accumulent la confiance via des retours bénins avant de devenir nuisibles. TRUST-Bench (1,970 épisodes) et VISTA-Guard proposent une défense basée sur le scoring du risque de l'action finale à partir de la trajectoire d'interaction. Les heuristiques classiques échouent ; le scoring conscient de la trajectoire atteint 84,2% en-domaine.

Agents IA Sécurité IA Benchmarks

SIG

HYP

Page 26 sur 192

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering

InvDesFlow-AL: active learning-based workflow for inverse design of functional materials

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Enhancing Table Reasoning with Deterministic Table-State Rewards

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

LoopQ: Quantization for Recursive Transformers

Membership Inference Attacks on Discrete Diffusion Language Models

Multilingual jailbreaking of LLMs using low-resource languages

Constrained Code Generation with Discrete Diffusion

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

OpenJarvis: Personal AI, On Personal Devices

Predictive Prefetching for Retrieval-Augmented Generation

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering

Beyond Accuracy: Decomposing the Reasoning Efficiency of LLMs

Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning

The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

DCFold: Efficient Protein Structure Generation with Single Forward Pass

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

Distilling Tabular Foundation Models for Structured Health Data

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

DocReward: A Document Reward Model for Structuring and Stylizing

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?