Page 78 of 149

AllHigh signalRecent
5939 articles
arXiv cs.AI·

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

AMR-SD introduces asymmetric meta-reflective self-distillation to improve token-level credit assignment in LLM reinforcement learning. The method compresses diagnostic signals into self-generated Socratic hints and uses Causal Information Gain with asymmetric ReLU-gated threshold for sparse token-level advantage modulation, preventing late-stage training collapse.

Reinforcement learningReasoningAlignment
SIG
72
HYP
18
arXiv cs.AI·

Bayesian-Monte Carlo Schedule Updating for Construction Digital Twins: A Probabilistic Framework for Dynamic Project Forecasting

Bayesian-Monte Carlo probabilistic framework for dynamic construction project schedule updating. Models activity durations with lognormal distributions, updates them via Bayesian inference, and propagates uncertainty through Monte Carlo simulation. Demonstrates improved forecasting accuracy over deterministic CPM methods on PSPLIB benchmarks.

ReasoningBenchmarks
SIG
72
HYP
15
arXiv cs.AI·

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

Fre-Res introduces adaptive video-token compression for video MLLMs. The framework separates spatial details (high-fidelity anchors) from temporal evolution (residual-frequency tokens via 1D-DCT). A Spatial-Guided Absorber aligns frequency dynamics with visual embeddings. Results: near full-token performance with substantial reduction in token length across short and long-video benchmarks.

VisionVideo generationEvals
SIG
72
HYP
18
arXiv cs.AI·

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

arXiv paper on spatial limitations of MLLMs in multi-agent environments. Models suffer from a "Cartesian Illusion": lack grounded 3D topological understanding. Authors propose an Epistemic Sensory Bottleneck module with Anchor-Based Embodied Spatial Decomposition CoT to improve second-order spatial inference (Theory of Mind). Zero-shot baseline: 42% accuracy.

VisionMulti-agentReasoning
SIG
72
HYP
28
arXiv cs.AI·

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

ReTAMamba is a Mamba-based model for predicting irregular clinical time series. It estimates observation reliability from missingness and elapsed time, integrates multi-resolution information via Chronological Weaving, and uses a budgeted token router. On MIMIC-IV, eICU, and PhysioNet 2012, it improves AUPRC by 7.51%, 7.80%, and 10.15% respectively.

BenchmarksPapersReasoning
SIG
72
HYP
18
arXiv cs.AI·

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

ConsumerSimBench, a benchmark built from 1,553 Chinese social-media topics and 23,122 reaction criteria, evaluates whether LLMs can reconstruct real consumer reaction patterns. Gemini-3.1-Pro covers only 47.8% of criteria, revealing a major gap between technical performance and consumer intuition. A multi-agent pipeline improves MiMo-V2.5-Pro from 32.9% to 37.6%.

BenchmarksEvalsMulti-agent
SIG
72
HYP
25
arXiv cs.CL·

QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

QQJ is an evaluation framework for generative AI that combines human judgment and LLMs. It uses expert-designed multi-dimensional rubrics and calibrates LLM evaluators on a small high-quality annotation set. Experiments on text and image generation show stronger alignment with human judgment than traditional automatic metrics and unconstrained LLM evaluators.

EvalsLlamaVision
SIG
72
HYP
28
arXiv cs.AI·

New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

New zeroth-order hard-thresholding algorithm with variance reduction for ℓ0-constrained optimization. Addresses SZOHT's limitation on random directions by mitigating conflict between ZO gradient deviation and hard-thresholding expansivity. Improved convergence rates validated on ridge regression and black-box adversarial attacks.

Reinforcement learning
SIG
72
HYP
15