Back to feed
Reddit r/MachineLearning·

Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]

Signal
72
Hype
28
In three linesGraph spectral analysis (Fiedler value + Scheffer critical slowing down) predicts grokking 21k steps before loss convergence. Five reproducible CPU experiments: early detection, distinct structural fingerprints for grokking vs catastrophic forgetting, guided intervention preserves 91.7% vs 2.6%, 48x acceleration across sequential tasks. Limited to 2-layer MLPs and 1-layer transformers.
Read source
Your take?
PapersEvalsReasoningOpen source

Summary generated by Claude — human-verified