Back to feed
Reddit r/LocalLLaMA·

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

Signal
72
Hype
25
In three linesMininglamp AI added W8A8 activation quantization to MLX via Cider, a custom SDK with Metal kernels. On M5 Pro, prefill improved from 2.84s to 2.52s for a 4B VLM. Works with any MLX model, but INT8 TensorOps requires M5+.
Read source
Your take?
Open sourceInfrastructureToolsBenchmarks

Summary generated by Claude — human-verified