Page 29 of 192

AllHigh signalRecent

7679 articles

OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation

OpenDeepThink uses Bradley-Terry aggregation to select best solutions from multiple parallel candidates. The system randomly compares answer pairs, aggregates votes, and preserves top candidates for mutation. On Codeforces, Gemini 3.1 Pro gains +405 Elo points across 8 LLM calls (~27 min). Authors release CF-73, 73 expert-annotated problems.

Reasoning Benchmarks Gemini

SIG

HYP

arXiv cs.CL·May 19

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

SD-Search introduces on-policy hindsight self-distillation for search-augmented reasoning agents. A single model acts as both student (inference-time context only) and teacher (conditioned on search outcomes from rollout groups). Step-level supervision via Jensen-Shannon divergence at query positions, integrated into GRPO training without external models or annotations.

Reasoning Reinforcement learning AI Agents

SIG

HYP

arXiv cs.LG·May 19

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

Researchers formalize activation steering (LLM control without retraining) as budget-constrained optimization over intervention layer and coefficient. They introduce concept granularity to measure directional heterogeneity and present GRACE, a framework using activation geometry to diagnose steering difficulties and reduce required evaluations by 39.8% on average.

Reasoning Alignment Papers

SIG

HYP

Page 29 of 192

OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

SignMuon: Communication-Efficient Distributed Muon Optimization

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

State-of-the-Art Claims Require State-of-the-Art Evidence

LARGER: Lexically Anchored Repository Graph Exploration and Retrieval

LoopQ: Quantization for Recursive Transformers

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Mixing Times of Glauber Dynamics on Masked Language Models

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

Scale Determines Whether Language Models Organize Representation Geometry for Prediction

Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning

Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States

LiTS: A Modular Framework for LLM Tree Search

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes

Constrained Code Generation with Discrete Diffusion

Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?

SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

Distilling Tabular Foundation Models for Structured Health Data