May 2026

3149 articles

How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)

A developer tests the minimum model size for orchestrating a local ReAct loop. Qwen3.6-35B-A3B (MoE, ~3B active) is his threshold: below it, the model invents tool parameters or overgeneralizes calls. He improves accuracy by exposing exact signatures in the system prompt.

AI Agents Qwen Prompt engineering

SIG

HYP

Reddit r/LocalLLaMA·May 22

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

BeeLlama v0.2.0 delivers major performance gains with DFlash optimization. On RTX 3090: Qwen 3.6 27B reaches 164 tps (4.40x speedup), Gemma 4 31B 177.8 tps (4.93x). Full Gemma 4 31B support, reduced DFlash overhead, improved prefill handling, stricter draft/target validation.

Qwen Open source Code generation

SIG

HYP

Hacker News (AI)·May 22

Microsoft Drops Claude Code After Budget Overrun

Microsoft discontinues Claude Code following budget overrun. The service, integrated into Copilot, failed to meet profitability targets set by the company.

Claude Code generation Business

SIG

HYP

The Decoder·May 22

Deepseek reportedly prioritizes AGI research over quick profits despite billions in funding

DeepSeek raises approximately $10 billion, valuing the Chinese AI startup at $45 billion. Founder Liang Wenfeng tells investors AGI research takes priority over short-term profits.

DeepSeek Funding Reasoning

SIG

HYP

The Decoder·May 22

OpenAI Appshots turn any Mac window into context for Codex

OpenAI launches Appshots, a macOS feature enabling users to send any app window's contents to Codex with a single click. Codex receives the context needed to complete coding tasks.

OpenAI Code generation Tools

SIG

HYP

Reddit r/LocalLLaMA·May 22

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Trained prompt injection detector using ml-intern and DeepSeek v4 Flash. DistilBERT achieves F1 99%, compressed to ONNX int8 (~65 MB), runs in browser via Transformers.js v3. Total API cost under $5 with DeepSeek.

DeepSeek AI Agents AI safety

SIG

HYP

Le Big Data·May 22

Meta lance Forum, son nouveau Reddit avec, évidemment, une couche d’IA

Meta launches Forum, a community discussion app competing with Reddit, featuring AI-generated responses to reinvigorate Facebook groups.

Meta AI Tools

SIG

HYP

Reddit r/LocalLLaMA·May 22

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop

ByteShape's CPU-5 quant for Qwen3.6-35B-A3B achieves 30% faster token generation than Unsloth UD-IQ4_XS on 6GB VRAM laptop GPU, with slightly slower prefill speed. Tested on RTX 3060 with 65536 token context.

Qwen Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 22

Experts first llama.cpp

Experimental llama.cpp fork optimizing MoE for 12GB VRAM GPUs. Author selectively loads experts to VRAM instead of full layers, reaching 26 tk/s on RTX 2060 (vs 19 tk/s default) with 62% hit rate. Seeking testers on 3060/4060.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 22

I ran a quantization shootout on Qwen3-Coder and the results are... interesting

Quantization benchmark on Qwen3-Coder-Next using 3× R9700 PRO. UD-Q5_K_M outperforms MXFP4_MOE on all quality metrics (94% vs 89.4% top-1 accuracy, KL divergence 0.0217 vs 0.0746) with negligible speed penalty (~10% decode). Unsloth's dynamic precision approach exponentially reduces cumulative errors on long outputs.

Qwen Code generation Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·May 22

Qwen-27B-IQ4_KS for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

New Qwen-27B-IQ4_KS quantization optimized for 16GB NVIDIA GPUs via ik_llama.cpp. 14.1GB model delivers performance comparable to previous IQ4_XS, 1.5-1.75x faster, 105k token context window. Tests: Needle In Haystack 100k passed, perplexity 71.10.

Qwen Open source Tools

SIG

HYP

Hugging Face Blog·May 22

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

Hugging Face argues that AI model specialization outperforms raw scale in procurement decisions. Organizations typically favor large generalist models, overlooking that smaller specialized models deliver better performance and lower costs for specific tasks.

Open source Business Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 22

Some tests with qwen3.6 27b + 35b a3b about MTP vs ngram-mod

User benchmarks Qwen 3.6 27B and 35B with MTP vs ngram-mod optimization techniques. Finding: MTP degrades performance on React code generation task; ngram-mod preserves quality. Setup: Qwen 27B Q6_K + Qwen 35B Q8 on dual GPU 16GB+12GB.

Qwen Code generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 22

Open source: cloned Rocky's voice from Project Hail Mary in two days, full pipeline + 2:10 of training audio + trained RVC v2 model

Rocky's voice (Project Hail Mary) cloned in two days via open-source pipeline. Audio extraction (ffmpeg + demucs), transcription (Whisper), diarization (pyannote), then RVC v2 training on 2:10 min audio. Trained .pth model (55MB) and code public. Tested XTTS v2 / YourTTS / RVC v2 / OpenVoice v2.

Voice Open source Code generation

SIG

HYP

The Decoder·May 22

OpenAI burned through $1.22 per dollar earned even after stripping out stock-based compensation

OpenAI generated $5.7 billion in Q1 2026 revenue but lost $1.22 per dollar earned, with an adjusted operating margin of -122%.

OpenAI Business

SIG

HYP

Reddit r/LocalLLaMA·May 22

OpenBMB presents the model BitCPM-CANN 1.58 bit

OpenBMB presents BitCPM-CANN, a model quantized to 1.58 bits. Testing underway on Huawei Ascend 910B accelerators.

Open source Benchmarks

SIG

HYP

The Decoder·May 22

California governor signs first US executive order to protect workers from AI job loss

California's governor signed the first executive order by a US governor aimed at protecting workers from AI-driven job loss.

Regulation

SIG

HYP

Reddit r/LocalLLaMA·May 22

[llama.cpp] Asymmetric KV q8/q4 cache: current caveats and discussion in GGML repo

llama.cpp supports asymmetric KV caches (q8/q4) but currently falls back to CPU processing instead of GPU with CUDA for certain combinations. User evaluation shows q8_0/q4_0 costs only 1.3% precision loss while reducing memory by over 50% vs f16/f16.

Llama Open source Infrastructure

SIG

HYP

Le Big Data·May 22

Spotify et Universal Music Group préparent des remixes IA officiels

Spotify and Universal Music Group officially launch AI-generated remixes and covers on their music streaming platforms.

Business Tools

SIG

HYP

Hacker News (AI)·May 22

Valve removes free game from Steam after players discover it contains malware

Valve removed a free game from Steam after players discovered it contained malware. The incident raises questions about the platform's security controls.

AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 22

[NEW] Supra-50M Released!

SupraLabs releases Supra-50M, a 50M-parameter model trained on 20B tokens of high-quality educational text. Llama-style architecture with 32k vocab. Outperforms GPT-2 (124M) and SmolLM-135M on multiple benchmarks (BLiMP 76.3%, SciQ 77.2%, ARC-Easy 52.2%). Roadmap includes Supra-124M and Supra-350M.

Open source Benchmarks Code generation

SIG

HYP

Le Big Data·May 22

Microsoft et EY investissent 1 milliard de dollars pour accélérer l’industrialisation de l’IA

Microsoft and EY announce a $1 billion investment over 5 years to accelerate industrial AI deployment in enterprises.

Business Infrastructure

SIG

HYP

The Decoder·May 22

Trump pulls AI safety order after last-minute calls from Musk, Zuckerberg, and Sacks

Trump cancels an AI safety executive order after last-minute calls from Musk, Zuckerberg, and Sacks. The order would have established a voluntary review system for frontier models with a 90-day pre-release window.

Regulation AI safety Business

SIG

HYP

Reddit r/LocalLLaMA·May 22

DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals

DeepSeek raises $10.29 billion. Founder Liang Wenfeng commits to continuing open-source AI model development over short-term commercialization. Company targets AGI.

DeepSeek Open source Funding

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Lum1104 /</span> Understand-Anything

Open-source tool converting code into interactive, explorable knowledge graphs. Compatible with Claude Code, Cursor, Copilot, Gemini CLI, and other editors.

Code generation Tools Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> can1357 /</span> oh-my-pi

Oh-my-pi is an AI coding agent for the terminal featuring hash-anchored edits, LSP integration, Python support, browser capabilities, and subagents.

AI Agents Code generation Tools

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> github /</span> copilot-sdk

GitHub releases a multi-platform SDK for integrating Copilot Agent into third-party apps and services. Enables developers to access Copilot's AI capabilities through a standardized API.

AI Agents Code generation Tools

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> raine /</span> workmux

Workmux combines git worktrees and tmux windows for frictionless parallel development. Open-source tool enabling simultaneous management of multiple work branches with native tmux integration.

Tools Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> yamadashy /</span> repomix

Repomix is a tool that compresses an entire repository into a single file optimized for LLMs. Compatible with Claude, ChatGPT, DeepSeek, Perplexity, Gemini, and other AI models.

Code generation Tools Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> abhigyanpatwari /</span> GitNexus

GitNexus is a client-side code intelligence engine running entirely in the browser. It creates a knowledge graph from a GitHub repo or ZIP file, with a built-in Graph RAG Agent for code exploration.

RAG AI Agents Code generation

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> MemTensor /</span> MemOS

MemOS is a self-evolving memory OS for LLMs and AI agents. Features ultra-persistent memory, hybrid retrieval, and cross-task skill reuse with 35.24% token savings.

AI Agents RAG Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anomalyco /</span> opencode

OpenCode is an open-source coding agent available on GitHub. The project provides an automated solution for code generation and assistance.

Code generation AI Agents Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> phodal /</span> routa

Routa is a workspace-first multi-agent coordination platform for AI development. It features shared Specs, Kanban orchestration, and supports MCP/ACP/A2A across web and desktop.

Multi-agent MCP AI Agents

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> awslabs /</span> aidlc-workflows

AWS Labs releases aidlc-workflows, a framework of adaptive steering rules for directing AI coding agents. The project provides AI-driven lifecycle workflows with rule-based steering to improve code agent quality and reliability.

AI Agents Code generation Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> langchain-ai /</span> langchain

LangChain trending on GitHub. Agent engineering platform enabling construction of LLM-powered applications with multi-component orchestration.

AI Agents Tools Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> google-research /</span> timesfm

TimesFM is a pretrained foundation model developed by Google Research for time-series forecasting. The GitHub repository provides an open-source implementation of this specialized model.

DeepMind Open source Benchmarks

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Tracer-Cloud /</span> opensre

Tracer-Cloud/opensre is an open-source toolkit for building AI SRE (Site Reliability Engineering) agents. Enables automation of infrastructure and reliability tasks through intelligent agents.

AI Agents Open source Tools

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> facebookresearch /</span> sam3

Meta releases code and checkpoints for SAM 3 (Segment Anything Model 3). Repository includes inference, fine-tuning, and example notebooks for image segmentation.

Meta AI Vision Open source

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> microsoft /</span> agent-governance-toolkit

Microsoft releases governance toolkit for autonomous AI agents. Includes policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering. Covers all 10 OWASP Agentic Top 10 risks.

AI Agents AI safety Tools

SIG

HYP

GitHub Trending·May 22

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> plastic-labs /</span> honcho

Honcho is a memory library for building stateful agents. It enables persistence and interaction history management in multi-agent systems.

AI Agents Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 22

Quick note on sudden performance loss when running GGUFs

User reports sudden performance drop on GGUFs (Qwen3.5-35B and Unsloth model): from 20+ tg/s to 5 tg/s. Root cause: file corruption during manual MTP layer modifications. Solution: verify sha256sum integrity of downloaded models.

Qwen Open source Tools

SIG

HYP

Hacker News (AI)·May 22

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM benchmark, which measures models' ability to generate 3D code for architectural design.

Benchmarks Code generation

SIG

HYP

Reddit r/MachineLearning·May 22

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Numind releases NuExtract3, a 4B open-weight VLM based on Qwen3.5-4B under Apache-2.0 license. The model extracts structured data from complex documents (PDFs, forms, tables, invoices) to Markdown or JSON. Trained for 3 days on 8xH100, it supports multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6) and runs on 4GB VRAM minimum.

Vision Open source Code generation

SIG

HYP

ActuIA·May 22

France, 2e en maturité IA déclarée : 49% au stade pilote, 80% sans ROI mesuré

France ranks 2nd in declared AI maturity in Europe, yet 49% of projects remain in pilot stage and 80% have no measured ROI. Gap reveals disconnect between stated adoption and actual impact.

Benchmarks

SIG

HYP

Le Big Data·May 22

Hark obtient 700 millions de dollars pour son projet d’assistant IA universel

Hark raises $700 million to develop a universal AI assistant, reaching a $6 billion valuation.

AI Agents

SIG

HYP

Reddit r/LocalLLaMA·May 22

ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster

ztok is a multithreaded tokenizer library in Zig, 2–5× faster than tiktoken/HF/SentencePiece. Loads tiktoken, HF tokenizer.json, SentencePiece, TokenMonster, Mistral Tekken formats. Bit-identical to reference implementations, 8 language bindings, optimized for RAG and dataset tokenization.

Tools RAG Open source

SIG

HYP

Le Big Data·May 22

Spotify lance « Reserved » : l’appli va (enfin) vous choper des places aux concerts

Spotify launches « Reserved », a concert ticket reservation feature integrated into the app. The system allows users to access tickets directly from the music platform.

Business

SIG

HYP

Hacker News (AI)·May 22

Moss: Self-Evolution Through Source-Level Rewriting in Autonomous Agent Systems

Moss is an autonomous agent system capable of self-evolution through source-level code rewriting. The system modifies its own code to improve performance without external intervention.

AI Agents Code generation Reasoning

SIG

HYP

Reddit r/LocalLLaMA·May 22

New Release of ROCm based MLX LLM Engine - lemon-mlx-engine

Lemon-mlx-engine integrates ROCm 7.13 to run LLMs locally on AMD GPUs. Update includes bug fixes and kernel fixes for Qwen3, 3.5, and 3.6 MoE.

Open source Infrastructure Qwen

SIG

HYP

Le Big Data·May 22

IA prédictive : Traquer l’invisible dans les flux de données pour devancer les cybercriminels

Predictive AI analyzes data streams in real-time to detect behavioral anomalies and anticipate cyberattacks before they occur.

AI safety Business

SIG

HYP

Reddit r/MachineLearning·May 22

One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]

Discussion on the gap between benchmark performance and production robustness. High-scoring systems fail under user ambiguity, messy real-world context, and contradictory instructions. Call for evaluation methods beyond standard pipelines.

Evals Benchmarks

SIG

HYP

Latent Space·May 22

[AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

Three AI infrastructure startups reach unicorn status: Exa (vector search), Modal (cloud platform), and TurboPuffer (distributed cache). Major funding rounds confirm consolidation in the AI infrastructure market.

Infrastructure Funding Vector search

SIG

HYP

Reddit r/LocalLLaMA·May 22

Low-level coding dataset

Community-sourced coding dataset project for LLM fine-tuning, focused on C++ and systems programming. Author plans to fine-tune Qwen 3.6-27b to improve understanding of memory ownership, thread safety, and optimization concepts. Dataset structured in JSONL categories: generation, optimization, debugging, organization, tool-calling.

Fine-tuning Qwen Code generation

SIG

HYP

Hacker News (AI)·May 22

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

CODA rewrites transformer blocks as GEMM-Epilogue programs to optimize inference. The technique fuses matrix operations and post-processing into a single GPU primitive, reducing latency and memory bandwidth.

Reasoning Infrastructure Benchmarks

SIG

HYP

Simon Willison·May 22

FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service

FTC requires Cox Media Group and two other firms to pay nearly $1 million to settle charges they deceived customers about an "Active Listening" AI marketing service. The service claimed to listen to conversations via smart devices for ad targeting, but actually used no voice data at all.

Regulation AI safety Business

SIG

HYP

Reddit r/MachineLearning·May 22

Live Human Detector on Outbound Phone Calls [R]

ML project to detect whether an outbound call has reached a live agent (vs queue/RVA). Audio classification in 1-2s window on G711a 8kHz stream. Challenges: distinguish professional RVA from human speech, transition silence, voicemail, sophisticated TTS.

Code generation Evals

SIG

HYP

May 2026

How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

Microsoft Drops Claude Code After Budget Overrun

Deepseek reportedly prioritizes AGI research over quick profits despite billions in funding

OpenAI Appshots turn any Mac window into context for Codex

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Meta lance Forum, son nouveau Reddit avec, évidemment, une couche d’IA

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop

Experts first llama.cpp

I ran a quantization shootout on Qwen3-Coder and the results are... interesting

Qwen-27B-IQ4_KS for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

Some tests with qwen3.6 27b + 35b a3b about MTP vs ngram-mod

Open source: cloned Rocky's voice from Project Hail Mary in two days, full pipeline + 2:10 of training audio + trained RVC v2 model

OpenAI burned through $1.22 per dollar earned even after stripping out stock-based compensation

OpenBMB presents the model BitCPM-CANN 1.58 bit

California governor signs first US executive order to protect workers from AI job loss

[llama.cpp] Asymmetric KV q8/q4 cache: current caveats and discussion in GGML repo

Spotify et Universal Music Group préparent des remixes IA officiels

Valve removes free game from Steam after players discover it contains malware

[NEW] Supra-50M Released!

Microsoft et EY investissent 1 milliard de dollars pour accélérer l’industrialisation de l’IA

Trump pulls AI safety order after last-minute calls from Musk, Zuckerberg, and Sacks

DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals

Quick note on sudden performance loss when running GGUFs

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

France, 2e en maturité IA déclarée : 49% au stade pilote, 80% sans ROI mesuré

Hark obtient 700 millions de dollars pour son projet d’assistant IA universel

ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster

Spotify lance « Reserved » : l’appli va (enfin) vous choper des places aux concerts

Moss: Self-Evolution Through Source-Level Rewriting in Autonomous Agent Systems

New Release of ROCm based MLX LLM Engine - lemon-mlx-engine

IA prédictive : Traquer l’invisible dans les flux de données pour devancer les cybercriminels

One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]

[AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

Low-level coding dataset

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service

Live Human Detector on Outbound Phone Calls [R]

Unified Data Selection for LLM Reasoning

Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents

A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering

Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

FlyRoute: Self-Evolving Agent Profiling via Data Flywheel for Adaptive Task Routing

SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

Hypergraph as Language

Residual Skill Optimization for Text-to-SQL Ensembles

Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

$ECUAS_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Quantitative coronary calcification analysis for prediction of myocardial ischemia using non-contrast CT calcium scoring

Leveraging Self-Paced Curriculum Learning for Enhanced Modality Balance in Multimodal Conversational Emotion Recognition

TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes

Embedding-Based Federated Learning with Runtime Governance for Iron Deficiency Prediction

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

Multi-Stage Training for Abusive Comment Detection in Indic Languages

Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective

Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis

Pattern-and-root inflectional morphology: the Arabic broken plural

Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

Psy-Chronicle:A Structured Pipeline for Synthesizing Long-Horizon Campus Psychological Counseling Dialogues