Back to feed
Reddit r/LocalLLaMA·

vLLM PR adding native HIP W4A16 kernel was merged

Signal
78
Hype
15
In three linesvLLM merged a PR adding native HIP W4A16 kernel for ROCm. Benchmarks show significant gains: 270.2 tk/s in fp16 (max-num-seqs=8) and 445.7 tk/s (max-num-seqs=32), outperforming previous Triton implementations.
Read source
Your take?
Open sourceInfrastructureBenchmarks

Summary generated by Claude — human-verified