vLLM PR adding native HIP W4A16 kernel was merged
Signal
78
Hype
15
In three linesvLLM merged a PR adding native HIP W4A16 kernel for ROCm. Benchmarks show significant gains: 270.2 tk/s in fp16 (max-num-seqs=8) and 445.7 tk/s (max-num-seqs=32), outperforming previous Triton implementations.Read source
Your take?
Summary generated by Claude — human-verified