Topic

#DeepSeek

DeepSeek is a Chinese AI company known for building high-performance, open-source language models at low training cost. Its model DeepSeek-R1 demonstrated reasoning capabilities on par with leading Western models.

40Articles

11Sources

63Avg. signal

arXiv cs.LG·Jun 18

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Structural pruning framework for Mixture-of-Experts models operating at channel level rather than expert level. Attribution-based method reformulates pruning as channel-score coverage maximization. Experiments on DeepSeek and Qwen models achieve 50% structured pruning with 4-bit quantization, 5.27× memory reduction on Qwen3-30B-A3B.

DeepSeek Qwen Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

US refrains from blacklisting DeepSeek but designates over 100 Chinese firms as security risks. Policy decision amid US-China tech and trade tensions.

DeepSeek Regulation Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Bench evaluates whether AI agents can build playable games end-to-end in a real game engine. Benchmark tests Opus-4.7, GPT-5.5, Kimi-K2.6, DeepSeek-V4-Pro and others. No results reported for medium-sized models (27B-31B).

AI Agents Benchmarks Code generation

SIG

HYP

Le Big Data·Jun 17

DeepSeek réalise une levée géante de plus de 7 milliards de dollars

DeepSeek closes a funding round exceeding $7 billion, among the largest in the AI sector. Record amount for the Chinese startup specializing in language models.

DeepSeek Funding Business

SIG

HYP

Hacker News (AI)·Jun 16

DeepSeek V4 Pro at 5% the cost of Claude – what it takes to close the gap

DeepSeek V4 Pro delivers Claude-comparable performance at 5% of the cost. The article examines technological and economic gaps between models, lacking precise benchmark figures or exact pricing details.

DeepSeek Claude Benchmarks

SIG

HYP

The Decoder·Jun 16

Microsoft's Copilot Cowork moves to usage-based billing and may tap DeepSeek

Microsoft is considering a fine-tuned version of DeepSeek V4 as a cheaper model option for Copilot Cowork. The company is also switching to usage-based billing, with Copilot head Charles Lamanna stating flat-rate pricing is unsustainable.

DeepSeek Business AI Agents

SIG

HYP

The Decoder·Jun 16

DeepSeek takes outside money for the first time at a $50 billion valuation

DeepSeek raises 50 billion yuan ($7.4 billion) in its first external funding round, reaching a $50 billion valuation.

DeepSeek Funding Business

SIG

HYP

arXiv cs.CL·Jun 16

Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

ASAG, a training-free method analyzing attention distributions, detects overthinking in reasoning models and adaptively stops generation. Tested on DeepSeek-R1-Distill and Qwen3, it improves accuracy by 3.2% while reducing generated tokens by 40% on Qwen3-8B.

Reasoning DeepSeek Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 14

You can run Deepseek 4 flash on mac (M3 Max, 96gb)

Deepseek 4 Flash runs on Mac M3 Max 96GB using Antirez's ds4 engine with SSD streaming. Performance: 11-13 tokens/s decoding, 10s cold boot, 3-5s TTFT. 36k token prefill takes 2m30s. Setup requires iogpu.wired_limit_mb=86016 and --ssd-streaming flag.

DeepSeek Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 14

Dual DGX Sparks- 40tk/s single 1M ; 350 tk/s agg. - Deepseek V4 Flash (vs RTX Pro 6000 vs Mac M2 Ultra 192)

Deepseek V4 Flash benchmarks on dual DGX Sparks: 40 tk/s FP8 (single), 350 tk/s aggregate across 32 requests. Compared to RTX Pro 6000 (46 tk/s Q2) and M2 Ultra 192GB (29 tk/s Q2). Requires ConnectX7 200G/s cable ($180) for inter-GPU sync.

DeepSeek Benchmarks Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 13

DeepSeek v4 Pro is too big for such a "midrange" performance, or am I missing something?

User questions DeepSeek v4 Pro's (1.6T parameters) relevance given mediocre performance versus smaller models: GLM 5.2 (750B), Kimi K2.7 (1T), MiniMax M3 (450B), and MiMo v2.5 Pro (1T) outperform it on benchmarks. Questions whether the model's value lies primarily in Huawei-based inference infrastructure rather than model quality.

DeepSeek Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 11

How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier?

DeepSeek v4 Pro scores 80.6 on SWE-bench and 93.5 on LiveCodeBench but CAISI rates it 8 months behind US frontier (vs 2 months per DeepSeek). Coding benchmarks are narrow and heavily optimized; gaps emerge in cybersecurity and abstract reasoning. Quantized local versions drift further from headline scores.

DeepSeek Benchmarks Code generation

SIG

HYP

Vercel AI Blog·Jun 11

DeepSeek models now available via Azure on AI Gateway

Azure now supports DeepSeek V4 Pro and V4 Flash on Vercel AI Gateway. Requests automatically route through Azure with fallback to other providers—no code changes required. Supports BYOK (bring your own keys), zero platform fees on inference.

DeepSeek Infrastructure Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 10

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention (LSA), an inference paradigm reducing KV cache footprint to 13.5% of baseline on ultra-long contexts (500K tokens). A Neural Memory Indexer predicts future context demands and preserves only query-critical chunks in GPU memory, without loading the full backbone model. Results: +0.6% average accuracy on LongBench-v2, LongMemEval, RULER.

DeepSeek Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·Jun 10

Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation

Moonshine is an autonomous agent generating mathematical conjectures by extracting structure from classical problems and formulating significant conjectures. Applied to the Jacobian conjecture, it transfers the logic to affine-ridge sigmoid networks, formulating the Neural Jacobian Conjecture (NJC). GPT-5.5-pro and DeepSeek-V4-pro obtained complete proofs for N=n+1.

AI Agents Reasoning Papers

SIG

HYP

arXiv cs.AI·Jun 10

Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

DeepSeek-R1-8B fine-tuned with LoRA and NEFTune for financial named-entity recognition. On 1693 annotated samples, the model achieves micro-F1 of 0.912 across 7 entity types, outperforming Llama3-8B, Qwen3-8B, and Baichuan2-7B.

DeepSeek Fine-tuning RAG

SIG

HYP

arXiv cs.LG·Jun 10

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

TENP proposes a structured pruning framework for Mixture-of-Experts LLMs. The method identifies important experts and applies neuron-level pruning to less important experts in a trapezoidal pattern across layers. On DeepSeek with 40% routing sparsity and 63.76% activated expert parameters, accuracy drop is limited to 1 point, with +10% improvement on code generation tasks.

DeepSeek Qwen Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 8

Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper

DeepSeek v4 Flash optimization on Hopper GPU: achieving 193 tok/s using Canada-Quant quantization and vLLM MTP patching. Author documents performance gains to reduce local inference costs versus API pricing ($0.1966/M tokens).

DeepSeek Code generation AI Agents

SIG

HYP

Vercel AI Blog·Jun 8

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend

DeepSeek V4 captured 17% of token volume on AI Gateway in May 2025, jumping from <1% in April, thanks to pricing 20–50× lower than Claude. Despite massive volume growth, DeepSeek accounts for only 1% of spend, while Anthropic dominates production costs.

DeepSeek Anthropic OpenAI

SIG

HYP

arXiv cs.CL·Jun 8

Explain Like I'm 5 or Whatever I Choose: Evaluating the Interactive Potential of Language Model Responses

Evaluation study of LLMs (GPT-5.1, GPT-5 mini, Claude Sonnet 4.5 + Thinking, DeepSeek-V3.1) on their ability to generate multiple responses to the same scientific query while varying language complexity. On 98 queries, Claude Sonnet 4.5 maintains consistent complexity only 46% of the time. Evaluation framework based on formative study with 16 participants.

Evals Claude GPT

SIG

HYP

The Decoder·Jun 7

Deepseek topped Ramp's trending software vendors in June 2026 as US companies chase cheaper AI

DeepSeek topped Ramp's trending software vendors in June 2026 as a paid service. US companies are adopting it to cut AI costs, but Ramp's chief economist warns of security risks associated with Chinese models.

DeepSeek Business AI safety

SIG

HYP

Reddit r/LocalLLaMA·Jun 6

DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162)

DeepSeek V4 Flash gains llama.cpp support via PR #24162 in early stages. Model combines frontier-level intelligence, quantization robustness (native FP4-FP8 hybrid), and efficient KV cache scaling. Currently 5-6 tps, GPU/FA support WIP, but correctness validated.

DeepSeek Open source Infrastructure

SIG

HYP

Reddit r/MachineLearning·Jun 4

On-policy distillation: one of the hottest terms on PapersWithCode [R]

On-policy distillation (OPD) is a key post-training technique used by Qwen 3.6/3.7, GLM-5.1, and DeepSeek-V4. The method uses an auxiliary model to identify errors in trajectories and inject correction tokens, allowing the main model to learn without regenerating new rollouts.

Fine-tuning Reinforcement learning Qwen

SIG

HYP

Le Big Data·Jun 4

DeepSeek viserait une levée de fonds de 7 milliards de dollars avec Tencent et CATL

DeepSeek is reportedly preparing a $7 billion funding round with Tencent and CATL, which would be one of the largest recent AI funding rounds in China.

DeepSeek Funding Business

SIG

HYP

arXiv cs.CL·Jun 3

G^2C-MT: Graph-Guided Context Selection for Document-Level Machine Translation

G²C-MT proposes graph-guided context selection for document-level machine translation. The system models discourse dependencies between paragraphs via a lightweight graph and uses depth-biased random walks to extract context paths. Tested on DeepSeek-V3, Gemini-2.5-Flash-lite, and Qwen-2.5/3, the approach outperforms baselines across multiple domains.

Papers Benchmarks DeepSeek

SIG

HYP

ActuIA·Jun 2

Qwen et DeepSeek : Pékin scelle leurs données d'entraînement, l'AI Act les réclame

Since June 2026, European digital authorities using Qwen or DeepSeek must comply with the AI Act requiring disclosure of training data. Beijing refuses to share them, creating a major regulatory conflict between the EU and Chinese providers.

Qwen DeepSeek Regulation

SIG

HYP

Reddit r/LocalLLaMA·Jun 1

Deepseek V4 flash performance on DGX Spark

User deploys Deepseek V4 Flash on DGX Spark (2x ASUS GX10) via vLLM. Max context 256k tokens, prefill throughput 1680-2150 T/s, decode 37-49 T/s across window sizes. Consistent performance, low degradation. Model outperforms M2.7 and Stepfun 3.7 on high-context reasoning benchmarks.

DeepSeek Infrastructure Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 1

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

A Reddit user reports a model (likely Minimax M3) trained on 100+ trillion tokens, double current standards (27-50T for Kimi, Mimo, Deepseek). Author doubts the model exceeds 500B parameters despite this massive data scaling.

DeepSeek Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 31

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Reddit user reports DeepSeek v4 Pro achieves 8% pass rate on DeepSWE benchmark, contrasting with their perception of near-parity with Claude Sonnet 4.6 in practice. Link to DeepSWE benchmark provided.

DeepSeek Benchmarks Code generation

SIG

HYP

The Decoder·May 29

New review paper argues code is how AI agents think and act, not just what they produce

A review paper argues the real bottleneck for autonomous AI agents isn't the language model but the software layer around it: tools, memory, testing, and permission boundaries turn a stateless model into a working agent. Deepseek is building a dedicated « Harness » team in Beijing confirming this thesis.

AI Agents DeepSeek Code generation

SIG

HYP

Hacker News (AI)·May 29

DeepSeek Slashes AI Costs to Cents

DeepSeek drastically reduces AI inference costs to cents. The Chinese company optimizes its models to lower computational resource consumption and usage fees.

DeepSeek Business

SIG

HYP

Le Big Data·May 29

DeepSeek V4 : émancipation chinoise et urgence d’une stratégie IA européenne

DeepSeek V4 represents a major breakthrough in Chinese AI and challenges the effectiveness of Western strategies. The article highlights Europe's urgent need to develop a competitive AI strategy in response to this technological independence.

DeepSeek Regulation

SIG

HYP

arXiv cs.AI·May 28

TCP-MCP: Landscape-Guided Co-Evolution of Prompts and Communication Topologies for Multi-Agent Systems

TCP-MCP co-evolves agent prompts and communication topologies as a unified genome. On MMLU-Pro, MMLU, and GSM8K with DeepSeek-V3.2 backbone, the system achieves 82.66%, 89.96%, and 96.61% accuracy while consuming 5.69× fewer tokens than debate-style systems.

Multi-agent Prompt engineering Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 28

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding)

Developer seeking optimal infrastructure (~$100-150k) to self-host Kimi K2.6 and DeepSeek V4 locally for 5-person team (agentic coding). Compares dual GH200 NVL2 (1.2TB unified memory, $95k) vs 8x RTX 6000 Blackwell (768GB VRAM, $140k). Single GH200 test: 23 tok/s decode at 2-bit quant, but slow prefill and models overflow into slower unified memory.

DeepSeek Kimi AI Agents

SIG

HYP

Reddit r/MachineLearning·May 27

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Dataset of 5,000 synthetic QA pairs for fine-tuning UK GDPR compliance assistants. Each pair includes practical SME questions and answers with specific article references, ICO guidance, and actionable steps. Generated via Qwen 14B and DeepSeek API. MIT license, 1K sample on Hugging Face.

Fine-tuning RAG DeepSeek

SIG

HYP

arXiv cs.AI·May 27

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Reasoning models (LRMs) jointly encode refusal in residual stream activations and chain-of-thought (CoT). On DeepSeek-R1-Distill-LLaMA-8B, activation steering reverses refusal in 39% of cases with fixed CoT, but 70% without CoT. Regenerating CoT under steering achieves 94% success, revealing refusal is distributed across activations and CoT.

Reasoning AI safety Alignment

SIG

HYP

arXiv cs.AI·May 27

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

MeDial-Speech: dataset of 111+ hours of spoken medical dialogues (robot-patient and doctor-patient) covering 4 health conditions. Benchmark of 3 LLMs (GPT-4 mini, DeepSeek-V3, Claude Sonnet 4) via sentence selection: Claude Sonnet 4 achieves 71.1% accuracy. Reveals systematic overconfidence in model predictions.

Benchmarks Claude DeepSeek

SIG

HYP

The Decoder·May 26

China reportedly now requires top AI researchers to get permission before leaving the country

China now requires top AI researchers at Alibaba and DeepSeek to obtain official approval before leaving the country. Beijing fears data leaks, technology theft, and talent poaching.

Regulation DeepSeek Business

SIG

HYP

GitHub Trending·May 26

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Hmbown /</span> CodeWhale

CodeWhale is an agentic coding terminal prioritizing DeepSeek with multi-provider support, cache optimization, 5-locale UI, and CN-region endpoints.

AI Agents Code generation DeepSeek

SIG

HYP

Reddit r/LocalLLaMA·May 25

The reason small-model agent stacks aren't the default has nothing to do with whether they work

Small specialized models (Gemma 4 31B at 86.4% on tau2-bench, Qwen 27B outperforming 397B models) now dominate agentic benchmarks. Yet the industry keeps deploying expensive frontier models: frontier labs profit from per-token billing, creating misalignment between technical performance and market adoption.

AI Agents Benchmarks Qwen

SIG

HYP