arXiv cs.CL·25 May 2026

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

Signal

Hype

In three linesStudy of 16 language models (1.5B–72B parameters) showing representational convergence does not extend to reasoning processes. Models align more on collectively failed problems (CKA=0.897) than solved ones (CKA=0.830). Post-decision representations diverge sharply (CKA=0.274), and shared information exerts minimal causal influence (1.5–5.5% flip rate).

Read source

Your take?

Papers Reasoning Evals Alignment

Summary generated by Claude — human-verified

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

Other angles on this story