Topic

#DeepSeek

DeepSeek is a Chinese AI company known for building high-performance, open-source language models at low training cost. Its model DeepSeek-R1 demonstrated reasoning capabilities on par with leading Western models.

40Articles
12Sources
62Avg. signal
Reddit r/LocalLLaMA·

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding)

Developer seeking optimal infrastructure (~$100-150k) to self-host Kimi K2.6 and DeepSeek V4 locally for 5-person team (agentic coding). Compares dual GH200 NVL2 (1.2TB unified memory, $95k) vs 8x RTX 6000 Blackwell (768GB VRAM, $140k). Single GH200 test: 23 tok/s decode at 2-bit quant, but slow prefill and models overflow into slower unified memory.

DeepSeekKimiAI Agents
SIG
45
HYP
00
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Hmbown /</span> CodeWhale

CodeWhale is an agentic coding terminal prioritizing DeepSeek with multi-provider support, cache optimization, 5-locale UI, and CN-region endpoints.

AI AgentsCode generationDeepSeek
SIG
45
HYP
00
arXiv cs.AI·

DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

DBES is a diagnostic framework for evaluating expert specialization in Mixture-of-Experts models. Five theoretically grounded metrics measure domain isolation and routing specialization. Testing on Qwen, DeepSeek, and GLM reveals distinct specialization paradigms. Targeted post-training on specialized expert paths improves performance by 66–94% using only 15% of original training resources.

BenchmarksQwenDeepSeek
SIG
82
HYP
00
DeepSeek — AI news · Signal IA