Back to feed
Reddit r/LocalLLaMA·

Get you some GPUs, it's not worth the hacks around lack of RAM

Signal
35
Hype
25
In three linesA r/LocalLLaMA user recommends investing in GPUs rather than using workarounds for limited VRAM. He reports running Qwen 3.6-27B in Q8 with f16 K/V cache on 2×used RTX 3090s, achieving 128k context length (1399 pp, 104 tg).
Read source
Your take?
QwenOpen sourceInfrastructure

Summary generated by Claude — human-verified