Back to feed
Reddit r/LocalLLaMA·

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Signal
72
Hype
35
In three linesLocal inference of DeepSeek-V4-Flash (284B, 13B active) on 4x RTX 2080 Ti (~$2.5k). Custom Turing CUDA kernels + W8A8 quantization + heterogeneous offloading achieve 255 tok/s prefill. Open-sourced on GitHub.
Read source
Your take?
DeepSeekCode generationOpen sourceInfrastructure

Summary generated by Claude — human-verified