Page 59 of 147

AllHigh signalRecent

5853 articles

Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning

GuardedRepair is a guarded best-of-N repair framework for LLM mathematical reasoning that selectively fixes incorrect traces while preserving correct answers. On GSM8K (95.60% → 96.89%), it fixes 17 of 58 errors with no measured broken-correct cases. On weak-reasoner ASDiv, accuracy improves from 78.40% to 87.60%.

Reasoning Evals AI safety

SIG

HYP

arXiv cs.AI·May 26

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

Researchers replicate Picbreeder (interactive image evolution platform) by replacing human users with Vision Language Models (VLMs). Results show qualitative differences from human baseline. Study of causal factors: exploratory noise, behavioral diversity between agents, memory of past actions.

Vision AI Agents Open source

SIG

HYP

arXiv cs.AI·May 26

Confidence Calibration in Large Language Models

Preregistered study shows current LLMs are overconfident: confidence exceeds accuracy on average. A hard-easy effect moderates this bias: overconfidence peaks on difficult tasks, while easy tasks show substantial underconfidence. Introduces LifeEval, a benchmark for evaluating model calibration across difficulty levels.

Evals Benchmarks Reasoning

SIG

HYP

arXiv cs.AI·May 26

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv paper analyzing latency-reliability-cost tradeoffs in LLM-enabled multi-agent workflows. Introduces performance models for LLM and non-LLM agents, proposes water-filling token allocation policy, and characterizes optimal workflow reliability via shadow prices under latency and cost constraints.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·May 26

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

Paper on runtime enforcement of Reconstructive Authority (RAM) in autonomous agent systems. Introduces execution model with three states (admit/deny/halt), dynamic dependency resolution, and Recovery Loop integrating drift detection with execution control. Guarantees no action executes without constructible authority.

AI Agents AI safety Reasoning

SIG

HYP

Page 59 of 147

Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

Confidence Calibration in Large Language Models

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling

Beyond Predefined Learning Objects: A Thinking-Learning Interaction Model for Up-to-Date Autonomous Robot Learning

MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

Neuro-Inspired Inverse Learning for Planning and Control

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

CAFD: Concept-Aware DNN Fault Detection using VLMs

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions

Not All Transitions Matter: Evidence from PPO

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation

Phonetic Modeling of Dialectal Variation in Vietnamese Speech

Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval

BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization

From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery

A lift for input-convex neural network training

Mixture of Complementary Agents for Robust LLM Ensemble

Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great

Firecrawl joins the Vercel Marketplace

Update on 12x32gb sxm v100 cluster / local AI for legal drafting