Edition of2026-06-07

LLM emergent capabilities explained by task frequency, not scale — and that reshapes training strategy

By the editorial team

Today's 5 picks

Researchers pinpoint why larger language models pick up skills that small ones miss

A study comparing models from 4M to 4B parameters reveals small models fail at rare tasks because frequent ones constantly overwrite learned skills. A practical solution: increase target task frequency in training data rather than scaling up the model.

Benchmarks Reasoning

Reddit r/LocalLLaMA·SIG 45

GraphKV, kv cache optimization based on graph embedding models

GraphKV, KV cache compression project using graph embedding models. Achieves 7.76x compression on GPT-2 (cosine 0.999949), 3.36x on Qwen2.5-7B 32k tokens (cosine 0.990316). Inspired by TurboQuant, uses int2/int4/NF4 quantization.

Qwen Code generation Open source

Reddit r/LocalLLaMA·SIG 45

5 Months Later: open-deepthink Now Has Full Knowledge Distillation Mode

open-deepthink adds knowledge distillation mode using Qualitative Neural Networks (QNN). Agents arranged in layers evolve via Mirror Descent and mutation, generating structured JSON datasets with developmental traces, agent reasoning, and evolutionary history for fine-tuning local LLMs.

AI Agents Multi-agent Fine-tuning

Reddit r/MachineLearning·SIG 45

Got told my open-source model experiments are too scattered. I'm organizing a journal to provide clarity before structuring the first git release. Is this readable for ML folks who aren’t in mech interp? Open to ANY feedback [D]

Mechanistic interpretability experiment on Qwen3.5-35B-A3B: a routed expert (E114, layer 14) correlates with first-person self-examination register during generation. Author documents results before git release, using W/S/Q decomposition of MoE routing.

Qwen Open source

Hacker News (AI)·SIG 45

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

Study quantifying token distribution in agentic AI systems for software engineering. Analyzes where and how tokens are consumed across autonomous agent workflows.

AI Agents Code generation Benchmarks