May 2026

3149 articles

A successful Japanese trial of a ramjet engine designed for Mach‑5 aircraft

Japan successfully tested a ramjet engine designed for Mach-5 aircraft. The trial validates hypersonic propulsion technology, a key milestone toward next-generation supersonic aircraft.

Infrastructure

SIG

HYP

Reddit r/MachineLearning·May 25

The famous METR AI time horizons graph contains numerous severe errors [D]

Nathan Witkin (NYU Stern) harshly critiques METR's AI time horizons graph. Errors include: unmeasured human baselines merely estimated, hourly-paid benchmarkers incentivized to work slowly, biased sample toward authors' peers, and failure to account for familiarity advantage (5-18x faster). Witkin concludes the graph contains too many compounding errors to be salvaged.

Benchmarks Evals AI safety

SIG

HYP

Reddit r/MachineLearning·May 25

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

DCGAN with 12.6M parameters runs on RISC-V CH32H417 microcontroller (512KB SRAM). Generates 64×64 cat faces in 26 seconds using pure C inference engine with int8 per-channel quantization. Weights streamed from SD card via double buffering. Z vector seeded with 200 bytes quantum random data (ANU QRNG). No existing frameworks (TFLite, CMSIS NN) — built from scratch.

Code generation Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·May 25

We gave an LLM a structural graph of a codebase before exploring. It used 54% MORE context than without one. Paper + explanation inside [R]

Controlled study on TypeScript codebase (25 sections, 3,250 files): LLM (Kimi K2.6) equipped with structural graph (Blueprint: Universal Ctags + ast-grep + BM25) consumed 54% more input tokens (63,541 vs 41,327) but explored deeper (6 turns vs 5). Graph costs ~6,500 tokens and increases model's navigational confidence.

Code generation RAG Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 25

AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset

Fine-tuned Qwen 3.5 0.8B on Pangram's EditLens dataset to detect AI-generated content. Chrome extension 'Slop Hammer' released for local inference (~1s on M1), 400MB model. 20h training on single RTX 3090. Limitation: dataset built with older LLMs, struggles with GPT-5.5.

Qwen Fine-tuning Evals

SIG

HYP

Hacker News (AI)·May 25

I just sequenced a human genome to 30× coverage at home

A user sequenced a complete human genome to 30× coverage at home using open-source tools and accessible hardware. Demonstrates democratization of genome sequencing outside professional laboratories.

Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 25

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

Spice is an open-source project adding an explicit decision layer above AI agents. It documents observations, considered options, reasoning for choices, and rejected trade-offs before execution, making agent behavior less of a black box. Compatible with Claude Code, Codex, and other agents.

AI Agents MCP Open source

SIG

HYP

Reddit r/LocalLLaMA·May 25

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

CUDA implementation of Fast Walsh-Hadamard Transform (FWHT) for llama.cpp optimizing KV-cache quantization. 1-2% speedup on prefill, 7-9% on token generation with RTX 5090 and q8_0 quantization.

Open source Infrastructure Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 25

Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge)

Researcher launches red-teaming challenge on Llama 3.1 8B to stress-test SAFi, a runtime governance engine designed to enforce alignment of autonomous agents. 10 prompts to break a Socratic Tutor Agent (make it give direct answers or go off-topic from science/math). Open-source code available.

Llama AI Agents Alignment

SIG

HYP

Hacker News (AI)·May 25

Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing

Uber's COO states it's becoming harder to justify massive AI spending. The company questions the ROI of 'token maxxing'—accumulating compute capacity without clear use cases.

Business

SIG

HYP

Reddit r/LocalLLaMA·May 25

Llama.cpp : Split Mode Tensor Fix Incoming?

Llama.cpp reportedly preparing a fix for split mode tensor crashes on multi-GPU setups. Split tensor mode delivers ~35% throughput gain over layer mode but crashes every 90-120 minutes due to VRAM exhaustion.

Open source Infrastructure

SIG

HYP

Reddit r/MachineLearning·May 25

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

Delta Attention Residuals improves residual connections by routing over layer deltas (vᵢ = hᵢ₊₁ − hᵢ) instead of cumulative hidden states. Results: −8.2% PPL at 7.6B, 1.8× sharper cross-layer routing (max weight 0.2→0.6), <0.01% parameter overhead. Code and paper released.

Papers Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·May 25

I’m building an open-source decision layer above AI agents [P]

Spice is an open-source decision layer above AI agents. It makes the decision process explicit (observations, options, justifications, trade-offs) before execution, rather than treating agents as black boxes. Compatible with Claude Code, Codex, and other executors.

AI Agents Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 25

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

Call for papers for the 2nd Workshop on Efficient Reasoning at COLM 2026 (October 9). Deadline: July 12, 2026. Topics: multimodal reasoning under efficiency constraints, dataset curation, algorithmic innovations, fast inference (pruning, compression, KV-cache), benchmarks, on-device deployment, safety, real-time applications (healthcare, robotics).

Reasoning Benchmarks Robotics

SIG

HYP

Reddit r/LocalLLaMA·May 25

(Yet Another) KV cache calculator - kvanta.vcerny.cz

KVANTA, an open-source web-based KV cache calculator (Apache 2.0) for LLM/VLM from Hugging Face. The tool aims to outperform existing calculators with an improved interface.

Tools Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 25

Is Qwen3.6 current king for local agentic use?

A r/LocalLLaMA user reports Qwen 3.6 35B A3B outperforms other local models (Gemma 4, GLM 4.7 Flash) for agentic tasks, with fewer infinite loops and broken tool calls. Tested on Hermes Agent and Pi using IQ4_NL quantization.

Qwen AI Agents Open source

SIG

HYP

Reddit r/LocalLLaMA·May 25

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

RTPurbo transforms full-attention LLMs into sparse models in hundreds of training steps. The method exploits three observations: only certain heads require full attention, long-range retrieval uses a 16D subspace, and token selection is query-dependent. Results: 9.36x prefill speedup at 1M context, 2.01x decode speedup, accuracy preserved.

Reasoning Benchmarks Infrastructure

SIG

HYP

Le Big Data·May 25

Ce sénateur vote pour les data centers de Meta… et empoche le jackpot

A Republican senator from Louisiana voted in favor of a Meta data center project and allegedly profited financially from the decision. The article reveals a potential conflict of interest spanning two years.

Business Regulation

SIG

HYP

Reddit r/LocalLLaMA·May 25

Sharing my 'Local-LLM-Toolkit' repo

GitHub repo 'Local-LLM-Toolkit' shared documenting optimization techniques for local LLMs on Mac Studio M4 Max 128GB. Includes C and Swift code for performance improvements.

Open source Infrastructure Tools

SIG

HYP

Reddit r/LocalLLaMA·May 25

The Financial Times has published an article about Heretic

Financial Times reports Heretic, a GitHub tool, removes guardrails from Llama 3.3 in under 10 minutes. Creator Philipp Emanuel Weidmann confirms 3,500 'decensored' models created and 13 million downloads since launch.

Llama Open source AI safety

SIG

HYP

Vercel AI Blog·May 25

Building a real-time power outage map with Next.js on Vercel

Endeavour Energy, Australia's major electricity distributor, migrated its outage map to Next.js on Vercel. Results: sub-1s page loads during peak storm traffic, 5-minute data sync cycles, 38% faster deployments. Supabase handles real-time data layer.

Infrastructure Tools Business

SIG

HYP

Reddit r/LocalLLaMA·May 25

The reason small-model agent stacks aren't the default has nothing to do with whether they work

Small specialized models (Gemma 4 31B at 86.4% on tau2-bench, Qwen 27B outperforming 397B models) now dominate agentic benchmarks. Yet the industry keeps deploying expensive frontier models: frontier labs profit from per-token billing, creating misalignment between technical performance and market adoption.

AI Agents Benchmarks Qwen

SIG

HYP

Reddit r/LocalLLaMA·May 25

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable)

Numind releases NuExtract3, a 4B open-weight VLM based on Qwen3.5-4B (Apache-2.0 license). The model extracts structured data and converts documents/images to Markdown. Trained for 3 days on 8xH100, it handles PDFs, forms, tables with multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6) for self-hosting from 4GB VRAM.

Qwen Vision Open source

SIG

HYP

Hacker News (AI)·May 25

Pope Leo: opaque AI run by few firms risks "New Forms of Dehumanization"

Pope Leo raises concerns about opaque AI systems controlled by few firms risking 'new forms of dehumanization'. He calls for greater transparency and responsible AI governance.

Regulation AI safety Alignment

SIG

HYP

Reddit r/LocalLLaMA·May 25

Old Mac Pro still proving its worth

A 2013 Mac Pro with D700 GPUs (12 GB VRAM) now runs LLMs via Vulkan after recent driver support. Qwen 3.5 9B achieves 11 t/s, Qwen 2.5 Coder 22 t/s at 70k context. User reports Qwen 3.5 outperforms Claude Sonnet 4.6 on C#/.NET planning tasks.

Qwen Claude Llama

SIG

HYP

Reddit r/LocalLLaMA·May 25

llama.cpp oom issue

User reports memory leak in llama.cpp with Qwen3.6-27B-MTP-GGUF after 20-40 minutes of active use. Process gradually consumes more system RAM despite various configuration attempts (--no-mmap, --cache-ram 0, without MTP). Issue persists across multiple builds and Docker images.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 25

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

OSCAR RotationZoo provides precomputed rotation matrices for INT2 KV-cache quantization. The method achieves ~7× KV-cache memory compression with single-digit accuracy drop on GPQA for dense reasoning models (Qwen3-4B, Qwen3-8B, GLM-4.7). Code and rotations available on HuggingFace.

Benchmarks Open source Qwen

SIG

HYP

Le Big Data·May 25

Vidéo : ils ont filmé un système immunitaire en train de dévorer un cancer

Researchers filmed immune system cells destroying melanoma cancer cells in real time. Immune checkpoint inhibitors, used in medicine for 15 years, enable this therapeutic action now visualized directly.

Vision

SIG

HYP

Reddit r/MachineLearning·May 25

Call for Papers - Workshop on Unlearning and Model Editing U&ME at ECCV 2026 [R]

Call for papers for the U&ME (Unlearning & Model Editing) workshop at ECCV 2026. Organizers seek submissions on unlearning, model editing, model merging, compression, and lifelong learning. Work-in-progress and exploratory ideas welcome.

AI safety Alignment

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> hardikpandya /</span> stop-slop

Stop-slop is a skill file designed to detect and remove common AI-generated text markers from prose, such as repetitive phrases and generic formulations.

Prompt engineering Tools

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> garrytan /</span> gstack

Gstack: 23 opinionated Claude Code tools configured from Garry Tan's setup, covering CEO, designer, engineering manager, release manager, doc engineer, and QA roles.

Claude Code AI Agents Tools

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> affaan-m /</span> ECC

Agent harness performance optimization system. Integrates skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, and Cursor.

AI Agents Claude Code Code generation

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> claude-cookbooks

Anthropic releases claude-cookbooks, a collection of notebooks and recipes demonstrating practical and creative ways to use Claude.

Claude Prompt engineering

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> moeru-ai /</span> airi

Airi is a self-hosted, open-source AI companion capable of real-time voice chat, Minecraft and Factorio gameplay. Supports Web, macOS, and Windows. Inspired by Neuro-sama.

AI Agents Voice Open source

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> AlexsJones /</span> llmfit

llmfit: CLI tool to test hundreds of LLM models and providers on your hardware. One command to identify what runs locally.

Tools Open source Infrastructure

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Zackriya-Solutions /</span> meetily

Meetily is an open-source, self-hosted meeting assistant built on Rust. 4x faster transcription than Whisper/Parakeet, speaker diarization, Ollama-based summarization. 100% local processing, no cloud required.

Open source Voice Tools

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> nearai /</span> ironclaw

IronClaw is an Agent OS emphasizing privacy, security, and extensibility. Open-source project hosted on GitHub.

AI Agents Open source AI safety

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> NateBJones-Projects /</span> OB1

OB1 (Open Brain) offers a unified infrastructure layer: one database, one AI gateway, one chat channel. Compatible with any AI model, no middleware or SaaS required.

Infrastructure AI Agents Open source

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> CodebuffAI /</span> codebuff

CodebuffAI: command-line tool for code generation. Generates code directly from the terminal.

Code generation Tools

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> garrytan /</span> gstack

Gstack: Garry Tan's Claude Code setup with 23 opinionated tools automating CEO, designer, engineering manager, release manager, doc engineer, and QA roles.

Claude Code AI Agents Code generation

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> moeru-ai /</span> airi

Airi is a self-hosted AI companion supporting real-time voice chat, Minecraft and Factorio gameplay. Web, macOS and Windows support. Open-source project inspired by Grok and Neuro-sama.

Open source Voice AI Agents

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> OpenBB-finance /</span> OpenBB

OpenBB is an open-source financial data platform for analysts, quants and AI agents. It provides unified access to market data through a GitHub-hosted repository.

Open source AI Agents Tools

SIG

HYP

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> sansan0 /</span> TrendRadar

TrendRadar is an AI-driven trend monitor aggregating multi-platform news via RSS with smart alerts. Filters by keywords, translates and analyzes articles via AI, supports MCP for natural language dialogue, Docker deployment with local/cloud data, integrations with WeChat/Feishu/DingTalk/Telegram/Slack.

AI Agents MCP RAG

SIG

HYP

The Decoder·May 25

Google Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars

Google DeepMind's AlphaProof Nexus autonomously solved nine open Erdős problems, including two unsolved for 56 years, for a few hundred dollars per problem. The system uses the Lean compiler to automatically verify each proof step, with a 2.5% success rate.

DeepMind Reasoning Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 25

numind/NuExtract3 · Hugging Face

NuExtract3 is a 4B vision-language model for document understanding. It combines structured extraction (text/images + JSON template → JSON output) and image-to-Markdown conversion, with multilingual support and reasoning/non-reasoning modes. Available in GGUF, NVFP4, MLX, VLLM.

Vision RAG Code generation

SIG

HYP

The Decoder·May 25

George Hotz says coding agents will be "one of the most costly mistakes" in software development

George Hotz warns that AI coding agents will be "one of the most costly mistakes" in software development. After six months of testing, he concludes LLMs deliver fast prototypes but generate bugs that become increasingly hard to spot. His stance reflects deep divisions in the AI community over LLMs' role.

Code generation AI Agents AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 25

I built a computer use sandbox framework for codex on headless linux. GPU passthrough, computer use, and sudo access for codex all work. It's the perfect dev sandbox to allow full auto work while minimizing the "rm -rf /" risk

Developer builds sandbox framework for AI agents on headless Linux with GPU passthrough, sudo access, and host OS isolation. VM-based architecture enables autonomous web browsing, Docker execution, and parallel sessions. Code released on GitHub.

AI Agents Code generation Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 25

MiMo-V2.5-coder

Release of MiMo-V2.5-coder, a coding-optimized model for 128 GB RAM systems. Positioned as alternative to Qwen 3.6 and DeepSeek-4, featuring reliable tool calling and fast performance.

Code generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 25

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

Mininglamp AI added W8A8 activation quantization to MLX via Cider, a custom SDK with Metal kernels. On M5 Pro, prefill improved from 2.84s to 2.52s for a 4B VLM. Works with any MLX model, but INT8 TensorOps requires M5+.

Open source Infrastructure Tools

SIG

HYP

The Decoder·May 25

AI models often give the right answers but point to the wrong sources

Leading AI models like GPT and Gemini routinely cite text passages that don't support their answers, even when answers are correct. Researchers at Peking University term this "attribution hallucination" and introduce CiteVQA benchmark to systematically test for it.

GPT Gemini Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 25

I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop

A learning repo « MCP from Scratch » teaches Model Context Protocol in plain Node.js, from raw JSON-RPC to a working local agent loop (plan → act → observe) using node-llama-cpp and GGUF models. Designed to expose underlying mechanics without heavy abstractions.

MCP AI Agents Open source

SIG

HYP

Reddit r/LocalLLaMA·May 25

Qwen 3.6 benchmarks on 2x RTX PRO 6000

Qwen 3.6 benchmarks on 2x RTX PRO 6000 with vLLM. Qwen 3.6 27B BF16 reaches 1800 tps (64 concurrency, MTP-2). Qwen 3.6 35B BF16 reaches 3500 tps generation (128 concurrency, MTP-Off) with 30k tps prompt processing.

Qwen Benchmarks Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 25

server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

llama.cpp PR #22929 optimizes checkpoint creation to avoid full context re-processing when conversation history is edited. Use case: agentic coding with 70k tokens. Improves responsiveness by reprocessing only changed portions, tested for 2 weeks.

Llama AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 25

1000 tps generation on Qwen3.6 27B with V100s

User reports 1000 tokens/s generation on Qwen 3.6 27B with V100s at batch 128, and 80 t/s single-user (batch 1) without MTP. Processing throughput reaches 3000 t/s.

Qwen Benchmarks Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 25

Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead

Developer builds custom C++ inference engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B NPU, $149). Bypasses heavy frameworks with optimized AscendC kernels, achieving 5.90 tokens/s vs 2.88 baseline (170ms per step). Open-source on GitHub.

Open source Code generation Infrastructure

SIG

HYP

arXiv cs.CL·May 25

A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development

Inventory of text and speech resources for Hausa (80-100M speakers) and Fongbe (2M speakers). Hausa has diverse parallel corpora and text collections (news, encyclopedic, educational). Fongbe lacks text data but benefits from recent speech collection initiatives. Both languages represented in Masakhane benchmarks (NER, POS tagging).

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 25

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

BOHM is a hierarchical attribution method for compound AI systems that extracts component contributions directly from routing weights without evaluating arbitrary subsets. Zero marginal cost, compatible with opaque third-party APIs. On 18 LLMs (880 LiveCodeBench problems), Kendall tau=0.928 vs SHAP tau=0.980 at 9,000x more evaluations.

AI Agents Evals Reasoning

SIG

HYP

arXiv cs.CL·May 25

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Red-teaming study of 30+ open-source LLMs (10 families, 5 countries) measuring capacity to generate biased political content via jailbreaks. Findings: systematic asymmetries (left-leaning bias), Overton Window contraction with model size, substantial regional differences, variable jailbreak potency across model families.

AI safety Alignment Open source

SIG

HYP

May 2026

A successful Japanese trial of a ramjet engine designed for Mach‑5 aircraft

The famous METR AI time horizons graph contains numerous severe errors [D]

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

We gave an LLM a structural graph of a codebase before exploring. It used 54% MORE context than without one. Paper + explanation inside [R]

AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset

I just sequenced a human genome to 30× coverage at home

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge)

Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing

Llama.cpp : Split Mode Tensor Fix Incoming?

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

I’m building an open-source decision layer above AI agents [P]

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

(Yet Another) KV cache calculator - kvanta.vcerny.cz

Is Qwen3.6 current king for local agentic use?

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Ce sénateur vote pour les data centers de Meta… et empoche le jackpot

Sharing my 'Local-LLM-Toolkit' repo

The Financial Times has published an article about Heretic

Building a real-time power outage map with Next.js on Vercel

The reason small-model agent stacks aren't the default has nothing to do with whether they work

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable)

Pope Leo: opaque AI run by few firms risks "New Forms of Dehumanization"

Old Mac Pro still proving its worth

llama.cpp oom issue

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Vidéo : ils ont filmé un système immunitaire en train de dévorer un cancer

Call for Papers - Workshop on Unlearning and Model Editing U&ME at ECCV 2026 [R]

Google Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars

numind/NuExtract3 · Hugging Face

George Hotz says coding agents will be "one of the most costly mistakes" in software development

I built a computer use sandbox framework for codex on headless linux. GPU passthrough, computer use, and sudo access for codex all work. It's the perfect dev sandbox to allow full auto work while minimizing the "rm -rf /" risk

MiMo-V2.5-coder

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

AI models often give the right answers but point to the wrong sources

I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop

Qwen 3.6 benchmarks on 2x RTX PRO 6000

server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

1000 tps generation on Qwen3.6 27B with V100s

Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead

A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

When Determinants Are Not Enough: Private Rare Switching

Learnability-Informed Fine-Tuning of Diffusion Language Models

Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions

The Cognitive Kardashev Scale: Quantifying the Material Envelope of Civilisational Computation

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation

Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems

When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

Anytime Training with Schedule-Free Spectral Optimization

Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift

DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

Uncovering the Latent Potential of Deep Intermediate Representations

World Machine: Towards Generative World Modeling for Time-Series

Smoothed Elicitation Complexity for Approximate $\Gamma$-calibration of Discrete Classification Tasks

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

Steered Generation via Gradient-Based Optimization on Sparse Query Features

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

A mathematical theory of balancing relational generalization and memorization

Building a privacy-preserving Federated Recommender system for mobile devices

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

Reading Calibrated Uncertainty from Language Model Trajectories

DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text

Cultural Adaptation in Large Language Models for Political Discourse