Back to feed
arXiv cs.CL·

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

Signal
78
Hype
15
In three linesStudy of 16 language models (1.5B–72B parameters) showing representational convergence does not extend to reasoning processes. Models align more on collectively failed problems (CKA=0.897) than solved ones (CKA=0.830). Post-decision representations diverge sharply (CKA=0.274), and shared information exerts minimal causal influence (1.5–5.5% flip rate).
Read source
Your take?
PapersReasoningEvalsAlignment

Summary generated by Claude — human-verified