Back to feed
arXiv cs.LG·

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

Signal
78
Hype
15
In three linesSystematic error analysis of trajectory-based data attribution methods. Identifies optimizer mismatch (SGD vs AdamW) as dominant config-level error. Proposes AdamW-influence with 10-300% improvements in Spearman correlation across MLP, CNN, GPT-2, Llama 3.2-1B. Provides practical guidelines for data selection via K-step look-ahead framework.
Read source
Your take?
PapersEvalsFine-tuningReinforcement learning

Summary generated by Claude — human-verified