Reddit r/MachineLearning·18 May 2026

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

Signal

Hype

In three linesResidual Coupling (RC) connects frozen language models in parallel via lightweight learned linear projections, without weight modification. Linear bridges read hidden states from one model and inject additive updates into another's residual stream. On medical data, RC reduces perplexity to 11.02 vs 56.80 for MoE (+80.7%), and improves TruthfulQA by 9.1 percentage points.

Read source

Your take?

Llama Multi-agent Fine-tuning Benchmarks

Summary generated by Claude — human-verified

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

Other angles on this story