Get you some GPUs, it's not worth the hacks around lack of RAM
Signal
35
Hype
25
In three linesA r/LocalLLaMA user recommends investing in GPUs rather than using workarounds for limited VRAM. He reports running Qwen 3.6-27B in Q8 with f16 K/V cache on 2×used RTX 3090s, achieving 128k context length (1399 pp, 104 tg).Read source
Your take?
Summary generated by Claude — human-verified