arXiv cs.CL·21 May 2026

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Signal

Hype

In three linesStudy across 11 generations of self-training on 5 models (GPT-2, Pythia, OPT). Contrary to uniform 'flattening', language restructures: surface markers (connectives, em-dashes) rise while deep syntactic structures (questions, passives, subjunctives) collapse. Structural Depth Hypothesis predicts this decay (ρ=0.540, p<10⁻⁶).

Read source

Your take?

Papers Benchmarks GPT Reinforcement learning

Summary generated by Claude — human-verified

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Other angles on this story