Reddit r/LocalLLaMA·20 May 2026

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Signal

Hype

In three linesLocal inference of DeepSeek-V4-Flash (284B, 13B active) on 4x RTX 2080 Ti (~$2.5k). Custom Turing CUDA kernels + W8A8 quantization + heterogeneous offloading achieve 255 tok/s prefill. Open-sourced on GitHub.

Read source

Your take?

DeepSeek Code generation Open source Infrastructure

Summary generated by Claude — human-verified

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Other angles on this story