Page 49 of 144

AllHigh signalRecent

5739 articles

On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral

FETCH, an automated legal triage classifier, generates follow-up questions using a low-cost LLM ensemble. The study shows cheap models perform well at classification, but high-quality plain-language question generation requires GPT-4 or higher. Prompt engineering alone is insufficient; LLM-as-judge ratings diverge from human evaluations.

GPT OpenAI Prompt engineering

SIG

HYP

arXiv cs.AI·3d ago

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Survey paper proposing Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing. Maps classical computer architecture concepts to LLM systems (cache management, context, agents). Introduces three design laws: Semantic Locality Law, Context Budget Law, Agent Speedup Law. Distinguishes probabilistic execution plane from deterministic control plane.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·3d ago

Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials

Hybrid framework coupling LLMs with physics-based simulation for inorganic material synthesis planning. Case study on niobium-oxygen system: LLM-generated synthesis routes outperform classical path-planning algorithms by leveraging implicit priors.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.CL·3d ago

TCAR-Gen: Temporal Graph Retrieval with Evidence Fusion for Knowledge-Grounded Generation

TCAR-Gen combines query-conditioned graph neural networks, temporal evidence fusion, and chain-of-trees reasoning for retrieval-augmented generation. Achieves 0.3738 Recall@5 on Victorian Crime Diaries benchmark, outperforming Vanilla RAG, Temporal RAG, and GraphRAG variants. Cross-model evaluation across GPT-OSS 20B to TinyLlama 1.1B shows robust retrieval coverage at smaller scales.

RAG Reasoning Benchmarks

SIG

HYP

Page 49 of 144

On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials

TCAR-Gen: Temporal Graph Retrieval with Evidence Fusion for Knowledge-Grounded Generation

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents

Cognitive-Linguistic Indicators of Depression in Online Communities: Analysed by DistilBERT and Holographic Reduced Representation

Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo

Modeling Spectral Energy Shifts in Spatio-Temporal Graph Anomaly Detection

InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate

Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

Evaluating Bivariate Causal Statements Based on Mutual Compatibility

CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

VESTA: Visual Exploration with Statistical Tool Agents

CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO

Perturbative methods for non-parametric instrumental variable

Acting with AI: An Interaction-Based Framework for Agentic Tort Liability

Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

lmfaoooo at SemEval-2026 Task 1: Humor Is an Audience. Preference Modeling for Constrained Humor Generation

Rethinking the Role of Temperature in Large Language Model Distillation

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows

Adaptive Order Policies for Masked Diffusion

Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

Short-form Text Rewriting with Phi Silica

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Adaptive data selection improves wearable prediction under low baseline performance

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

Adversarially Robust Control of Conditional Value-at-Risk via Rockafellar-Uryasev Conformal Inference

Computex 2026: Intel launches Crescent Island GPU with up to 480GB VRAM

Why our #1 LightGBM feature by importance made predictions worse [D]

How much of MLE-Bench's gains are the algorithm vs. better models + more search? [R]