Back to feed
Reddit r/MachineLearning·

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

Signal
72
Hype
28
In three linesResidual Coupling (RC) connects frozen language models in parallel via lightweight learned linear projections, without weight modification. Linear bridges read hidden states from one model and inject additive updates into another's residual stream. On medical data, RC reduces perplexity to 11.02 vs 56.80 for MoE (+80.7%), and improves TruthfulQA by 9.1 percentage points.
Read source
Your take?
LlamaMulti-agentFine-tuningBenchmarks

Summary generated by Claude — human-verified