Reddit r/LocalLLaMA·28 May 2026

Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model

Signal

Hype

In three linesQwen3.6-35B-A3B-APEX quantized by mudler achieves 37 t/s generation with 72K filled context on RTX 3060 12GB via 17.3GB offloading. Spiritbuun's CUDA optimizations (fused MMA, TurboQuant, fattn) + APEX I-Compact quantization yield PPL 3.25. 128K context supported, degrades to 28 t/s @129K.

Read source

Your take?

Qwen Code generation Open source

Summary generated by Claude — human-verified

Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model

Other angles on this story