Page 30 of 192

AllHigh signalRecent

7679 articles

SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

SignRoundV2 is a post-training quantization framework for LLMs maintaining performance under extreme compression (2-4 bits). It combines adaptive mixed-precision strategy guided by gradients and lightweight stabilization techniques. Results show ~1% performance gap at 4.5 bits average in mixed MXFP, with substantial improvements in challenging 2-bit weight-only quantization.

Fine-tuning Benchmarks Infrastructure

SIG

HYP

arXiv cs.CL·May 19

Beyond Accuracy: Decomposing the Reasoning Efficiency of LLMs

arXiv paper introducing a trace-optional evaluation protocol decomposing token efficiency of reasoning LLMs. Analyzes 14 open-weight models on CogniLoad, GSM8K, ProofWriter, ZebraLogic by separating completion rate, conditional correctness, and generated length. Identifies three failure modes: logic-limited, context-limited, or verbosity-limited.

Reasoning Evals Benchmarks

SIG

HYP

arXiv cs.CL·May 19

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

EnvFactory automates creation of executable environments and synthesis of multi-turn trajectories for Agentic RL training. Using 85 verified environments across 7 domains, the framework generates 2,575 SFT/RL trajectories and improves Qwen3-series models by +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks.

AI Agents Reinforcement learning Code generation

SIG

HYP

Page 30 of 192

SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Beyond Accuracy: Decomposing the Reasoning Efficiency of LLMs

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering

SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking

Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning

Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

OmniCode: A Benchmark for Evaluating Software Engineering Agents

UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages

BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

Adversarial Agent Collaboration for Correctness Improvements of C to Safe Rust Translation

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

LoopQ: Quantization for Recursive Transformers

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery

DISA: Offline Importance Sampling for Distribution-Matching LLM-RL

Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

Locally Coherent Parallel Decoding in Diffusion Language Models

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

RAP: Runtime Adaptive Pruning for LLM Inference

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

InvDesFlow-AL: active learning-based workflow for inverse design of functional materials

KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search

Toward Robust Multilingual Adaptation of LLMs for Low-Resource Languages

Supervising the search process produces reliable and generalizable information-seeking agents

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Experimentally validated quantum-secure federated learning over a multi-user quantum network