Edition of2026-05-31

MTP baked into GGUF, Apple Silicon inference finally benchmarked properly, and search agents that mostly confirm what they already know.

By the editorial team

Today's 5 picks

mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !

Mudler releases APEX GGUF quantizations of Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled with bundled MTP (multi-token prediction) head. Files enable self-speculative decoding via llama.cpp without separate draft model. Size +2.5% vs non-MTP version, MTP head quantized Q8_0 for high draft accuracy.

Qwen Code generation Open source

Reddit r/MachineLearning·SIG 72

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P]

mlx-Chronos is an open-source CLI tool and community leaderboard to benchmark local LLM inference engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama). Measures TTFT, throughput, RAM, and thermal state with standardized methodology. Currently populated only with M2 8GB results.

Open source Benchmarks Infrastructure

The Decoder·SIG 72

AI search agents often confirm what they already know instead of actually researching the web

AI search agents like GPT-5.4 and Kimi K2.6 mostly confirm their training knowledge rather than genuinely researching the web. Researchers at Harbin Institute of Technology demonstrated this using LiveBrowseComp, a benchmark based on events from the last 90 days. Without relying on training memory, performance collapses.

Benchmarks AI Agents GPT

Reddit r/LocalLLaMA·SIG 65

Benchmarked inference engines for M1 Max 64gb-results & analysis

Benchmark of inference engines on M1 Max 64GB comparing rapid-mlx, omlx, mlx-lm, and ollama with Qwen 3.5-4B. Rapid-mlx leads on speed and memory efficiency. Results submitted to mlx-chronos community leaderboard.

Qwen Benchmarks Open source

Hacker News (AI)·SIG 45

Show HN: Komi-learn – continuous memory and self-improvement for coding agents

Komi-learn is a framework for coding agents with continuous memory and self-improvement capabilities. The project enables agents to learn from past experiences and improve performance over time.

AI Agents Code generation Open source