Page 47 sur 192

ToutHaut signalRécent

7679 articles

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Méthode pour étendre les LLM à de nouvelles langues sans phase d'alignement coûteuse. Convertit un modèle dense en architecture Mixture-of-Experts avec experts dédiés par langue, puis transfère les capacités d'alignement via fusion de deltas post-training. Améliore les performances sur les nouvelles langues tout en préservant les capacités originales.

Fine-tuning

SIG

HYP

Page 47 sur 192

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity

Are Sparse Autoencoder Benchmarks Reliable?

OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval

UniER: A Unified Benchmark for Item-level and Path-level Exercise Recommendation

Generative AI and the Productivity Divide: Human-AI Complementarities in Education

Global Automation Atlas

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

LLM-Safety Evaluations Lack Robustness

Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

The Unlearnability Phenomenon in RLVR for Language Models

EmoMind: Decoding Affective Captions from Human Brain fMRI

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

When Efficiency Backfires: Cascading LLMs Trigger Cascade Failure under Adversarial Attack

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Universal Dynamics of Punctuated Progress

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

Machine Unlearning for Masked Diffusion Language Models

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

AI for Auto-Research: Roadmap & User Guide

How Many Visual Tokens Do Multimodal Language Models Need? Scaling Visual Token Pruning with F^3A

Infini-News: Efficiently Queryable Access to 1.3 Billion Processed Common Crawl News Articles

The Token Games: Evaluating Language Model Reasoning with Puzzle Duels

HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction

Training Infinitely Deep and Wide Transformers

A More Word-like Image Tokenization for MLLMs

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

When Marginals Match but Structure Fails: Covariance Fidelity in Generative Models

An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization

StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video

Sustainability via LLM Right-sizing

Harnessing LLM Agents with Skill Programs

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Algorithmic Cultivation: How Social Media Feeds Shape User Language

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion