Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model
Signal
75
Hype
25
In three linesQwen3.6-35B-A3B-APEX quantized by mudler achieves 37 t/s generation with 72K filled context on RTX 3060 12GB via 17.3GB offloading. Spiritbuun's CUDA optimizations (fused MMA, TurboQuant, fattn) + APEX I-Compact quantization yield PPL 3.25. 128K context supported, degrades to 28 t/s @129K.Read source
Your take?
Summary generated by Claude — human-verified