Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild
Signal
75
Hype
25
In three linesZai replaced the network architecture on a 1000-GPU cluster running GLM-5.1 from ROFT to ZCube (developed with Tsinghua and HarnetsAI). Results: switch/optical costs down 33%, GPU throughput up 15%, P99 first-token latency down 40.6%. ZCube removes the Spine layer for full bipartite interconnect, eliminating asymmetric traffic hotspots inherent to Prefill-Decode disaggregated inference.Read source
Your take?
Summary generated by Claude — human-verified