We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro
Signal
72
Hype
25
In three linesMininglamp AI added W8A8 activation quantization to MLX via Cider, a custom SDK with Metal kernels. On M5 Pro, prefill improved from 2.84s to 2.52s for a 4B VLM. Works with any MLX model, but INT8 TensorOps requires M5+.Read source
Your take?
Summary generated by Claude — human-verified