Reddit r/LocalLLaMA·28 May 2026

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Signal

Hype

In three linesZai replaced the network architecture on a 1000-GPU cluster running GLM-5.1 from ROFT to ZCube (developed with Tsinghua and HarnetsAI). Results: switch/optical costs down 33%, GPU throughput up 15%, P99 first-token latency down 40.6%. ZCube removes the Spine layer for full bipartite interconnect, eliminating asymmetric traffic hotspots inherent to Prefill-Decode disaggregated inference.

Read source

Your take?

Infrastructure Reasoning

Summary generated by Claude — human-verified

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Other angles on this story