Archives

May 2026

3147 articles

arXiv cs.LG·

Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model

Extension of Equilibrium Propagation framework to skew-gradient systems with demonstrated equivalence between deep Energy-Based Models and Hamiltonian neural networks. Applied to diffusively coupled Fitzhugh-Nagumo neuron networks, showing stationary solutions admit spatial Hamiltonian structure enabling Hamiltonian Echo Backpropagation methods.

PapersReasoningReinforcement learning
SIG
72
HYP
15
arXiv cs.CL·

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

LatentOmni proposes an audio-visual reasoning framework using unified latent space instead of explicit text chain-of-thought. The model interleaves textual reasoning with audio-visual latent states, introduces Omni-Sync Position Embedding (OSPE) for temporal consistency, and leverages LatentOmni-Instruct-35K (35K annotated trajectories). Outperforms text-based baselines on audio-visual benchmarks.

ReasoningPapers
SIG
72
HYP
28
arXiv cs.AI·

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

DeepWeb-Bench is a deep research benchmark evaluating 9 frontier models on tasks requiring massive evidence collection, cross-source reconciliation, and long-horizon multi-step derivation. Errors stem primarily from derivation and calibration (>70%), not retrieval (12-14%). Strong and weak models fail differently: incomplete derivation vs hallucinated precision.

BenchmarksReasoningAI Agents
SIG
78
HYP
25
arXiv cs.AI·

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

ScenePilot generates critical scenarios for autonomous driving testing via multi-objective reinforcement learning. The framework combines RSS-derived physical feasibility with an AV-risk predictor to target boundary-band scenarios: physically solvable yet causing failures. Results: +6.2 percentage points collision rate on SafeBench while preserving physical validity.

Reinforcement learningAI safetyEvals
SIG
78
HYP
15
arXiv cs.AI·

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Students construct QuestBench, a 256-question benchmark across humanities and social sciences, to evaluate deep research systems. Testing reveals GPT-4.5 reaches 57.58% pass rate while mean performance is 16.85% across 13 systems, exposing hidden failures. This classroom practice teaches students to judge AI output quality and remain responsible knowledge actors.

BenchmarksEvalsGPT
SIG
72
HYP
25
arXiv cs.AI·

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

DDS (Declarative Data Services) is an architecture for structured agentic discovery of data-system compositions. Addressing unbounded agentic discovery failures, the framework decomposes search into typed sub-searches via four contracts (intent, operator DAG, skills, runtime attribution). Tested on a trading-backend workload, DDS converges where unbounded approaches fail.

AI AgentsMulti-agentPapers
SIG
72
HYP
18
arXiv cs.LG·

I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

I-SAFE is a post-hoc auditing framework for scientific AI models based on the Wasserstein Coherence Metric (WCM). It evaluates whether model predictions reflect domain structure or exploit statistical shortcuts. Tested on drug-target interaction prediction (DeepConvDTI, DeepDTA, TAPB), it reveals distinct distributional response profiles invisible to accuracy metrics.

EvalsAI safetyAlignment
SIG
72
HYP
15
arXiv cs.LG·

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

AI text detectors amplify a pretrained typicality axis rather than construct an AI-vs-human boundary. On RoBERTa-base, raw projection onto centroid(AI)-centroid(HC3) achieves AUROC 0.806-0.944, matching or exceeding fine-tuning. A closed-form Jacobian predictor transfers to 16/16 third-party detectors with oracle-equivalence, reducing FPR by 57% on the OpenAI detector.

EvalsBenchmarksAI safety
SIG
82
HYP
15
arXiv cs.LG·

When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning

Authors show teacher-token reliability in reasoning self-distillation depends on position within trajectory, not local entropy. They propose Position-Weighted OPSD (PW-OPSD), applying increasing position weights to token supervision. On Qwen3-4B, AIME 2024/2025 improve by +1.0/+1.1 points; validation on DeepSeek-R1-Distill-Llama-8B and Olmo-3-7B-Think confirms gains.

ReasoningFine-tuningBenchmarks
SIG
78
HYP
15
arXiv cs.LG·

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

yvsoucom-iterkit, a deterministic log-driven AutoML framework, optimizes medical risk prediction pipelines across 18,000+ configurations. On Pima and Stroke datasets, augmentation (0.454), model choice (0.198), and imbalance handling (0.101–0.406) are key drivers. Ensembles achieve F1 0.89–0.94 with cross-seed robustness (variability 0.023–0.026).

BenchmarksEvalsFine-tuning
SIG
72
HYP
18
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ChromeDevTools /</span> chrome-devtools-mcp

Chrome DevTools MCP: a Model Context Protocol enabling AI agents to interact directly with Chrome DevTools for real-time debugging and inspection of web applications.

AI AgentsMCPTools
SIG
65
HYP
25