Reddit r/LocalLLaMA·20 mai 2026

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Signal

Hype

En 3 lignesExécution locale de DeepSeek-V4-Flash (284B, 13B actifs) sur 4x RTX 2080 Ti (~2500€). Kernels CUDA Turing custom + quantization W8A8 + offloading hétérogène atteignent 255 tok/s en prefill. Code open-source sur GitHub.

Lire la source

Ton avis ?

DeepSeek Génération de code Open source Infrastructure

Résumé généré par Claude — vérifié par l'humain

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Autres angles sur ce sujet