Back to feed
Reddit r/LocalLLaMA·

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Signal
75
Hype
25
In three linesZai replaced the network architecture on a 1000-GPU cluster running GLM-5.1 from ROFT to ZCube (developed with Tsinghua and HarnetsAI). Results: switch/optical costs down 33%, GPU throughput up 15%, P99 first-token latency down 40.6%. ZCube removes the Spine layer for full bipartite interconnect, eliminating asymmetric traffic hotspots inherent to Prefill-Decode disaggregated inference.
Read source
Your take?
InfrastructureReasoning

Summary generated by Claude — human-verified