Back to feed
Reddit r/LocalLLaMA·

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Signal
72
Hype
15
In three linesUser achieves 30+ tokens/sec with Qwen3.6-35B-A3B Q4 quantized on RTX 3070 Ti 8GB with 262k context. Key: MoE model only needs 3.5B active in VRAM. Linux Server yields +25% tps vs Windows 11. Contexts up to 1M possible but slowdown beyond 150k.
Read source
Your take?
QwenOpen source

Summary generated by Claude — human-verified