Weird to get near linear scaling by adding another GPU?
Signal
45
Hype
15
In three linesUser reports near-linear scaling by adding second GPU (2x RTX 3090) for Qwen 3.6-27B: decode throughput increases from 53-62 TPS to 94-120 TPS without NVLink, using tensor parallelism=2. Notes parsing errors in VSCode Agent mode but overall performance improvement.Read source
Your take?
Summary generated by Claude — human-verified