Reddit r/LocalLLaMA·8 June 2026

Weird to get near linear scaling by adding another GPU?

Signal

Hype

In three linesUser reports near-linear scaling by adding second GPU (2x RTX 3090) for Qwen 3.6-27B: decode throughput increases from 53-62 TPS to 94-120 TPS without NVLink, using tensor parallelism=2. Notes parsing errors in VSCode Agent mode but overall performance improvement.

Read source

Your take?

Qwen AI Agents Code generation Infrastructure

Summary generated by Claude — human-verified

Weird to get near linear scaling by adding another GPU?

Other angles on this story