Page 29 of 139

AllHigh signalRecent
5536 articles
arXiv cs.LG·

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

PROWL introduces a KL-constrained adversarial curriculum to improve robustness of action-conditioned video world models. A policy exposes high-error trajectories of a diffusion-based model while a Prioritized Adversarial Trajectory (PAT) buffer re-ranks data by prediction error and learning progress. Evaluation on MineRL demonstrates improved robustness on out-of-distribution trajectories.

ReasoningReinforcement learningPapers
SIG
75
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> NVlabs /</span> Sana

NVIDIA Labs releases Sana, a linear diffusion transformer for efficient high-resolution image synthesis. Architecture reduces computational complexity while maintaining visual quality.

Image generationOpen sourcePapers
SIG
75
HYP
25
arXiv cs.AI·

CheckSupport: A Local LLM-Powered Tool for Automated Manuscript Submission Checklist Selection and Completion

CheckSupport is an open-source system using locally-deployed LLMs to automate reporting checklist recommendation and completion for scientific manuscripts. Evaluated on peer-reviewed manuscripts, it achieves 90% accuracy for checklist recommendations and 88% for item-level completion, processing each manuscript in 12.5 seconds on CPU-only hardware.

LlamaPrompt engineeringEvals
SIG
75
HYP
15
arXiv cs.AI·

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

UniversalRAG is a multi-modal RAG framework that retrieves and integrates knowledge from heterogeneous sources (text, images, videos) at variable granularities. It introduces modality-aware routing to avoid intra-modal bias and organizes each modality into granularity levels. Validated on 10 benchmarks, it outperforms single-modality and unified baselines.

RAGVisionVideo generation
SIG
75
HYP
25
arXiv cs.CL·

Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models

Guided Topology Diffusion (GTD) uses graph diffusion models to dynamically generate optimal communication topologies for multi-agent LLM systems. The iterative framework, guided by a proxy model predicting multi-objective rewards (accuracy, utility, cost), adapts topologies to tasks without gradient-based optimization, outperforming static approaches.

Multi-agentAI AgentsBenchmarks
SIG
75
HYP
25
arXiv cs.LG·

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

Study of adversarial action removal attacks in self-play reinforcement learning. An attacker selectively masks legal actions from the victim's action set. Experiments on poker (6 to 5,531 states) and two non-poker domains: learned masking causes substantially more damage than random masking, persists across Q-learning/PPO/NFSP/DQN, transfers between agents, and is amplified by self-play.

Reinforcement learningAI safetyBenchmarks
SIG
75
HYP
15
arXiv cs.CL·

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

TABOM, a post-training method for Diffusion Language Models, aligns optimization with the multi-step easy-to-hard decoding trajectory observed at inference. Via Boltzmann modeling of unmasking preferences, it derives a tractable pairwise ranking objective that reduces training-inference discrepancy and improves performance on new domains.

Fine-tuningReasoningPapers
SIG
75
HYP
15
arXiv cs.CL·

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

Researchers introduce IBPO (Implicit Behavior Policy Optimization), a credit assignment method for reinforcement learning with LLMs. By comparing multiple reasoning trajectories, the framework transforms sparse terminal rewards into step-sensitive learning signals, reducing gradient variance and improving stability on mathematical and code reasoning benchmarks.

Reinforcement learningReasoningCode generation
SIG
75
HYP
25
arXiv cs.AI·

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

An arXiv study challenges the assumption that Mixture of Experts models achieve domain specialization through sparse routing. The COMMITTEEAUDIT framework reveals a domain-invariant "Standing Committee"—a compact coalition of experts capturing most routing mass across domains, layers, and budgets. Peripheral experts handle domain-specific knowledge alone.

BenchmarksPapers
SIG
75
HYP
15
arXiv cs.AI·

Herding CATs: ALARA for Agent Harness Engineering in Portable Composable Multi-Agent Teams

Paper introducing CAT (Context-Agent-Tool), a data layer for managing multi-agent teams. Applies ALARA principle (as low as reasonably achievable exposure) to context. Evaluates 22 models (0.6B–35B parameters) on 115 practical tasks via npcsh, a CLI shell. ~2500 executions test file operations, web search, multi-step scripting, tool chaining, and inter-agent delegation.

Multi-agentAI AgentsTools
SIG
75
HYP
15
arXiv cs.AI·

CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering

CounterRefine is a lightweight repair layer for RAG that treats the first answer as a hypothesis to test. The system issues answer-conditioned expansion queries to retrieve candidate-specific evidence, then applies a deterministically-validated KEEP/REVISE refinement step. On SimpleQA, it improves a matched one-pass RAG baseline by up to 5.8 correct-rate points.

RAGReasoningEvals
SIG
75
HYP
15