*Lower* generation speed with H100 and H200 than with RTX 5090?
Signal
35
Hype
15
In three linesUser reports slower generation on H100 (42 tok/sec) than RTX 5090 (57 tok/sec) using llama.cpp with 31B Q6 model. H100 provides larger context (128k vs 26k) and higher bandwidth, yet generates slower.Read source
Your take?
Summary generated by Claude — human-verified