How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines
Signal
78
Hype
15
In three linesSystematic error analysis of trajectory-based data attribution methods. Identifies optimizer mismatch (SGD vs AdamW) as dominant config-level error. Proposes AdamW-influence with 10-300% improvements in Spearman correlation across MLP, CNN, GPT-2, Llama 3.2-1B. Provides practical guidelines for data selection via K-step look-ahead framework.Read source
Your take?
Summary generated by Claude — human-verified