Back to feed
Reddit r/LocalLLaMA·

Deepseek V4 flash performance on DGX Spark

Signal
65
Hype
15
In three linesUser deploys Deepseek V4 Flash on DGX Spark (2x ASUS GX10) via vLLM. Max context 256k tokens, prefill throughput 1680-2150 T/s, decode 37-49 T/s across window sizes. Consistent performance, low degradation. Model outperforms M2.7 and Stepfun 3.7 on high-context reasoning benchmarks.
Read source
Your take?
DeepSeekInfrastructureBenchmarksReasoning

Summary generated by Claude — human-verified