May 2026

3149 articles

Emergence of Frontier Superposition: M\"obius attractor and Cascade Supervision

Theoretical paper on emergence of superposition in Transformers for depth reasoning. Identifies a Möbius attractor under S_n symmetry and Cascade Supervision enabling gradient descent to converge to equal-weight superposition state on Erdős-Rényi graphs. Analytical predictions validated experimentally (final cosine 0.37 vs 0.69).

Reasoning Papers Reinforcement learning

SIG

HYP

arXiv cs.LG·May 20

Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods

Survey of vulnerabilities in quantum machine learning models to adversarial attacks. Covers existing attacks, quantum-enhanced countermeasures, theoretical foundations, and critical challenges in the emerging field of quantum adversarial machine learning.

Papers AI safety Alignment

SIG

HYP

arXiv cs.LG·May 20

Multi-Pedestrian Safety Warning at Urban Intersections Use Case of Digital Twin

Multi-pedestrian safety warning system at urban intersections using a tightly coupled digital twin framework with camera and ultra-wideband sensors, trajectory prediction modeling. Deployed on COSMOS testbed in New York City, delivers real-time alerts via edge-cloud computing and significantly reduces response times for vulnerable road users.

Vision Infrastructure AI safety

SIG

HYP

arXiv cs.LG·May 20

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches

SAECache introduces a semantic-aware eviction policy for LLM prefix caches. Not all tokens are equally worth caching: different token types (system prompts, user queries, tool outputs, reasoning) show up to 756x variation in reuse rates. SAECache uses a multi-queue architecture with online learning to adapt priorities, achieving 1.4x-2.7x TTFT improvement over production baselines.

Reasoning Infrastructure Benchmarks

SIG

HYP

arXiv cs.LG·May 20

In-Context Learning Operates as Concept Subspace Learning

Mechanistic study of in-context learning (ICL) showing structured demonstrations induce concept inference in low-dimensional subspaces. On Llama-3-8B, a 68–73-dimensional subspace of 4096 restores 78.8% of clean–corrupted accuracy gap, while the complementary subspace has zero effect. Results confirmed on Qwen2.5-7B and cross-lingual rule tasks.

Reasoning Llama Qwen

SIG

HYP

arXiv cs.LG·May 20

StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping

StampFormer is a multimodal deep learning model predicting physical fields in sheet metal stamping by fusing geometry and material properties. Tested on steel/aluminium panels, it achieves <8.5% relative error in <1 second, replacing costly FEA analyses.

Papers Vision Reasoning

SIG

HYP

arXiv cs.LG·May 20

An Integrated Forecasting Prototype for Emergency Department Boarding Time to Support Proactive Operational Decision Making

Forecasting prototype for emergency department boarding time using time series models (DLinear, NLinear) on real hospital data. Integrates weather, holidays, and local events. Prediction horizons: 6, 8, 10, 12, and 24 hours. MLOps web application developed for operational deployment.

Benchmarks Infrastructure Tools

SIG

HYP

arXiv cs.LG·May 20

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

VCR is a self-supervised framework learning robust representations from incomplete wearable sensor signals. It uses an orthogonal tokenizer to disentangle shared semantics from modality-specific residuals, combined with a missing-aware mixture-of-experts backbone. VCR improves performance on health monitoring tasks under single and multiple missing modalities.

Papers Embeddings Reinforcement learning

SIG

HYP

arXiv cs.LG·May 20

Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling

Study of 63 base models reveals hidden phase transition: below ~3.5B parameters, reasoning and truthfulness anticorrelate; above, they cooperate. Architecture, data curation, and training recipe independently shift this critical threshold. Width normalization eliminates anticorrelation; frontier models reach r=+0.72. Open-source steering tool and diagnostic dashboard released.

Benchmarks Alignment Reasoning

SIG

HYP

arXiv cs.LG·May 20

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

Analysis of 34 frontier models (2024-2026) showing reasoning and coding capabilities cooperate (r=+0.72) but vary by lab. DeepSeek shifted from reasoning-rich to coding-first (+11.2→-4.7); Google maintains balance; Anthropic oscillates. SWE-bench saturating while HLE and instruction-following remain discriminative. Seven falsifiable predictions for next 12 months with interactive dashboard.

Benchmarks Evals Reasoning

SIG

HYP

arXiv cs.AI·May 20

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

Learn-by-Wire Guard (LBW-Guard) is an autonomous governance layer that supervises the AdamW optimizer during language-model training. Tested on Qwen2.5-7B with WikiText-103, LBW-Guard reduces final perplexity from 13.21 to 10.74 (−18.7%) and accelerates training by 1.10×. Under extreme learning-rate stress (LR=3e-3), AdamW fails (perplexity 1885.24) while LBW-Guard remains stable (11.57).

Qwen Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.AI·May 20

AgentNLQ: A General-Purpose Agent for Natural Language to SQL

AgentNLQ, a multi-agent method, achieves 78.1% semantic accuracy on the BIRD benchmark for natural language to SQL conversion. The system uses an optimized orchestrator to plan, reflect, and self-correct queries, enriches schema with context-aware metadata, and incorporates user-provided business rules.

AI Agents Multi-agent Benchmarks

SIG

HYP

arXiv cs.AI·May 20

Interference-Aware Multi-Task Unlearning

New paper on multi-task machine unlearning: removing training data from shared models without degrading other tasks. Proposes interference-aware framework combining task-aware gradient projection and instance-level orthogonalization. Reduces interference by 30–53% on vision benchmarks.

Papers AI safety Vision

SIG

HYP

arXiv cs.AI·May 20

POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

POLAR-Bench is a diagnostic benchmark assessing privacy-utility trade-offs in LLM agents. A trusted model with privacy policy interacts with an adversarial third-party model across 10 domains and 7,852 samples. Frontier models withhold 99% of protected attributes, but open-weight models in the 1–30B range commonly used for on-device private inference leak up to 50% of sensitive data.

AI Agents AI safety Alignment

SIG

HYP

arXiv cs.LG·May 20

From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning

CPSS (Constraint Projection Safety Shield) converts cumulative safety budgets into adaptive state-level control constraints for nonstationary reinforcement learning. The mechanism dynamically adjusts risk thresholds based on context, guarantees per-state threshold satisfaction, and reduces safety violations in highway merging scenarios.

Reinforcement learning AI safety Reasoning

SIG

HYP

arXiv cs.AI·May 20

Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

Formalizes trust calibration for autonomous agents as preference learning. A policy gateway maintains a Gaussian-process posterior over human risk tolerance from binary approve/deny feedback, escalating uncertain decisions to humans. Structured as Preferential Bayesian Optimization with uncertainty-targeted querying.

AI Agents Reasoning AI safety

SIG

HYP

arXiv cs.AI·May 20

Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)

Formal framework describing knowledge graph capabilities for agents. Extends VoID/DCAT standards with Agentic Affordance Profile (AAP) to specify what an agent can prove, closure assumptions, and vocabulary grounding. Identifies divergence between schema and entailment regime as epistemic failure mode.

AI Agents RAG Papers

SIG

HYP

arXiv cs.AI·May 20

Not all uncertainty is alike: volatility, stochasticity, and exploration

Theoretical work on adaptive exploration under uncertainty. Distinguishes volatility (reward drift) from stochasticity (observation noise): volatility enhances optimal exploration, stochasticity suppresses it. Introduces CAUSE, a closed-form exploration bonus via control-as-inference, validated on Gaussian state-space bandits with latent dynamics.

Reinforcement learning Reasoning Papers

SIG

HYP

arXiv cs.AI·May 20

Agentic Trading: When LLM Agents Meet Financial Markets

Systematic review of 77 studies on LLM agents in financial trading. Only 19 studies meet minimum criteria (action output + closed-loop evaluation). Key finding: lack of comparable protocols, insufficient reproducibility (no R3 studies), and missing documentation on transaction costs and universe handling.

AI Agents Papers Evals

SIG

HYP

arXiv cs.AI·May 20

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

SERL, a selective environment-reweighted learning framework, improves multi-turn LLM agent training by leveraging granular environmental feedback (error messages, page changes, reference trajectories). On ALFWorld and WebShop, SERL achieves 90.0% and 80.1% success rates, outperforming existing RL and distillation baselines.

AI Agents Reinforcement learning Reasoning

SIG

HYP

arXiv cs.AI·May 20

Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

DMPO (Distribution-Matching Policy Optimization) solves mode collapse in on-policy RL methods like GRPO by approximating forward KL instead of reverse KL. On text and vision NP-Bench, DMPO achieves 43.9% and 43.1% Quality Ratio (vs 40.1% and 38.4% for GRPO), with +2.0% gains on mathematical reasoning.

Reinforcement learning Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·May 20

Efficient Elicitation of Collective Disagreements

Theoretical paper on efficient elicitation of collective disagreement among voters. Introduces the plurality matrix, a generalization of pairwise comparisons, to identify minimal aggregated preference information needed for disagreement measures. Shows that certain measures (rank-variance, divisiveness) require subset size 3, not just pairwise comparisons.

Papers Evals

SIG

HYP

arXiv cs.AI·May 20

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Self-evolving skill libraries suffer silent degradation termed 'library drift': unbounded accumulation without lifecycle management. Study isolates mechanism via ablations, provides trace-level diagnostics, and validates fix (outcome-driven retirement + bounded active-cap + meta-skill prior) lifting pass@1 from 0.258 baseline to 0.584 on MBPP+ hard-100.

AI Agents Code generation Benchmarks

SIG

HYP

arXiv cs.AI·May 20

Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents

Formal Skill is a runtime abstraction for LLM agents that structures reusable capabilities via JSON metadata, action schemas, Python executors, and hook-governed control logic. Implemented in FairyClaw (open-source event-driven runtime), it replaces natural-language procedures with executable state machines, reducing token usage while improving reliability on Harness-Bench.

AI Agents MCP Code generation

SIG

HYP

arXiv cs.LG·May 20

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

LILAC+ proposes a framework for safe continual reinforcement learning in nonstationary environments. The system combines three adaptive mechanisms: context-based safety constraints, adaptation-speed constraints, and budget-to-state enforcement. Evaluated in simulated driving, it reduces safety violations under distribution shift while maintaining competitive task performance.

Reinforcement learning AI safety Alignment

SIG

HYP

arXiv cs.LG·May 20

Robust Basis Spline Decoupling for the Compression of Transformer Models

New transformer compression method using B-spline-based decoupling. R-CMTF-BSD algorithm employs constrained coupled matrix-tensor factorization to reduce parameters while maintaining accuracy. Validated on Vision Transformer and Swin Transformer architectures with substantial parameter reduction.

Benchmarks Vision

SIG

HYP

Reddit r/LocalLLaMA·May 20

LM Studio finally added support for MTP Speculative Decoding

LM Studio adds MTP Speculative Decoding support in version 0.4.14 Build 2 (Beta). Requires llama.cpp 2.15.0. Feature must be manually enabled in model load parameters.

Tools Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·May 20

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Local inference of DeepSeek-V4-Flash (284B, 13B active) on 4x RTX 2080 Ti (~$2.5k). Custom Turing CUDA kernels + W8A8 quantization + heterogeneous offloading achieve 255 tok/s prefill. Open-sourced on GitHub.

DeepSeek Code generation Open source

SIG

HYP

Hacker News (AI)·May 20

Google Cloud has blocked our account, making some Railway services unavailable

Railway reports Google Cloud account blocked, causing service unavailability. No technical details provided on reasons or incident duration.

Infrastructure

SIG

HYP

Vercel AI Blog·May 20

Chat SDK adds message subjects and direct SDK access

Vercel AI Chat SDK adds message.subject to access parent context (GitHub issues/PRs, Linear) and exposes native SDKs (GitHub, Linear, Slack) for direct API calls. Per-message caching optimizes repeated access.

AI Agents Tools MCP

SIG

HYP

OpenAI Blog·May 20

How Ramp engineers accelerate code review with Codex

Ramp engineers use Codex with GPT-5.5 to accelerate code reviews and ship improvements faster. Substantive feedback is delivered in minutes instead of hours.

OpenAI Code generation Business

SIG

HYP

OpenAI Blog·May 20

The next phase of OpenAI’s Education for Countries

OpenAI expands Education for Countries with new partnerships, teacher training programs, and tools to improve global learning outcomes in schools.

OpenAI Business Tools

SIG

HYP

OpenAI Blog·May 20

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model disproved a major conjecture in discrete geometry by solving the 80-year-old unit distance problem. This breakthrough marks a milestone in AI-driven mathematics.

OpenAI Reasoning Benchmarks

SIG

HYP

Vercel AI Blog·May 20

Chat SDK now includes AI SDK tools

Vercel AI SDK now includes native agent tools via createChatTools(chat). Chat SDK's read and write actions are wired directly into agents. Three presets (reader, messenger, moderator) scope available tools. Write operations require approval by default.

AI Agents Tools Code generation

SIG

HYP

Vercel AI Blog·May 20

Chat SDK now supports callback URLs on buttons and modals

Vercel Chat SDK now supports callback URLs on buttons and modals. Developers can pause a run on a card and resume it when someone clicks a button. Form data is included in the payload sent to the endpoint.

Tools AI Agents

SIG

HYP

Reddit r/LocalLLaMA·May 19

Claude Code plugins a risk to local ecosystem?

Claude Code plugins, Anthropic-specific extensions, enable complex capabilities (auto-invoked skills, slash commands, subagents) beyond simple skills. Example: Microsoft's deep-wiki (3.5k LOC). Unlike skills, plugins aren't an open standard and few agentic apps support them, potentially creating vendor lock-in.

Claude Code AI Agents Multi-agent

SIG

HYP

Simon Willison·May 19

llm-gemini 0.32

llm-gemini 0.32 adds support for gemini-3.5-flash model. Plugin updated to access Google's latest Gemini 3.5 Flash version.

Gemini Tools

SIG

HYP

Simon Willison·May 19

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google released Gemini 3.5 Flash at general availability during Google I/O. The model (ID: gemini-3.5-flash) supports 1,048,576 input tokens and 65,536 max output tokens, with knowledge cutoff January 2025. Deployed across Google Search, Gemini app, and via API for developers and enterprises. Pricing increased versus previous Flash versions.

Gemini AI Agents

SIG

HYP

Reddit r/LocalLLaMA·May 19

Newbie vibe coding experience: Shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K

Developer switching from Claude Sonnet 4.6 to Qwen 3.6-35B for a 30k-line Pygame project. Sonnet hit length limits and struggled with bug resolution despite high costs. Qwen 3.6 locally (Ollama + Cline) solves issues Sonnet couldn't, with better context management.

Claude Qwen Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 19

Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history

Google AI Edge Gallery v1.0.13 and v1.0.14 add Gemma 4 with multi-token prediction, Pixel TPU support, experimental MCP, new skills, and chat history saving.

Gemini MCP Tools

SIG

HYP

Simon Willison·May 19

datasette-llm-accountant 0.1a4

Release of datasette-llm-accountant 0.1a4. Fixed bug tracking chains of responses (datasette-llm#7).

Tools Open source

SIG

HYP

Simon Willison·May 19

llm-gemini 0.32a0

llm-gemini 0.32a0 released. Compatible with llm>=0.32a0 alpha. Adds ability to stream reasoning tokens.

Gemini Tools Reasoning

SIG

HYP

Simon Willison·May 19

datasette-llm 0.1a8

Release of datasette-llm 0.1a8 fixing a bug where `llm_prompt_context()` hook did not fully collect chains of responses.

Tools Open source

SIG

HYP

Reddit r/LocalLLaMA·May 19

Qwen3.6:27B VRAM 16GB 5080: MTP Quant, Speeds, and Configs

User shares Qwen3.6-27B-Q3_K_S configuration on 16GB VRAM with RTX 5080. Achieves 47-61 tokens/s generation and 1095-1426 tokens/s prompt eval. Uses Q3_K_S quantization, 64 GPU layers, MTP speculative decoding with 0.59-0.80 draft acceptance rate.

Qwen Code generation Fine-tuning

SIG

HYP

Hacker News (AI)·May 19

OpenAI Adopts Google's SynthID Watermark for AI Images with Verification Tool

OpenAI integrates Google's SynthID watermark into DALL-E to mark AI-generated images. A verification tool detects these invisible markings, improving traceability of synthetic content.

OpenAI Image generation AI safety

SIG

HYP

The Decoder·May 19

Google overhauls its AI subscriptions at I/O 2026 with three tiers starting at $10 a month

Google restructures AI subscriptions with three tiers ($7.99–$99.99/month) based on compute consumption instead of daily limits. Launches Gemini Omni and Gemini Spark agent at I/O 2026.

Gemini AI Agents Business

SIG

HYP

Hacker News (AI)·May 19

AI-written story published in Granta, wins major literary prize

An AI-generated story was published in Granta and won a major literary prize. The event raises questions about creative authenticity and AI's role in traditional artistic fields.

Image generation

SIG

HYP

Reddit r/LocalLLaMA·May 19

Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage

Intel's upcoming Xe3P data center GPU features 160GB LPDDR5X memory (8 modules of 20GB) with 16-pin connector. Estimated memory bandwidth 704-760GB/s. Intel bypasses HBM shortage.

Infrastructure

SIG

HYP

Hacker News (AI)·May 19

Mistral AI Acquires Emmi AI to Create the Leading AI Stack

Mistral AI acquires Emmi AI to strengthen its technology stack. The acquisition aims to consolidate Mistral's infrastructure and model capabilities amid ongoing AI market consolidation.

Mistral Business

SIG

HYP

Reddit r/MachineLearning·May 19

Comparing data annotation platforms [D]

Comparison of 5 data annotation platforms: Scale AI (premium quality, 49% acquired by Meta, data exposure concerns), Appen (1M+ contractors, slow for small projects), CloudFactory (dedicated teams, lengthy onboarding), LabelBox (best software but no workforce). Gap identified: no platform optimizes for small teams needing 500-2000 labeled examples fast with full transparency.

Tools Business

SIG

HYP

Reddit r/MachineLearning·May 19

I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

AXON visualizes real-time concept activations in GPT-2 through a 3D force-directed graph. A Sparse Autoencoder decomposes the residual stream into interpretable features (geography, cities, languages) per generated token. Stack: TransformerLens + SAELens (backend), FastAPI WebSocket, Three.js (frontend). ~35ms/token on GPU.

GPT Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 19

PrivateScribe.ai - Fully local, MIT licensed, free AI transcription built with HIPAA/legal safeguards in mind - One Year Update!

PrivateScribe.ai, fully local open-source transcription platform (MIT license), announces v1 with signed macOS app. Stack: FasterWhisper, pyannote, Ollama, Vite/Flask/SQLite. 256-bit encryption, zero network calls, audit trail, speaker diarization. Built for clinics, law firms, therapists with HIPAA compliance.

Open source Voice Code generation

SIG

HYP

Hugging Face Blog·May 19

OlmoEarth v1.1: A more efficient family of models

Hugging Face releases OlmoEarth v1.1, a more efficient family of models for geospatial tasks. The new models deliver improved performance and inference speed compared to the previous version.

Open source Benchmarks Tools

SIG

HYP

Reddit r/LocalLLaMA·May 19

Open weights GLM and Mimo are better than Gemini 3.5 flash according to arena

According to Arena leaderboard, open-weight models GLM (#7) and Mimo (#9) outperform Gemini 3.5 Flash (#12) on coding tasks. The post challenges the hype surrounding Google's latest release.

Gemini Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·May 19

Nemotron-Labs-Diffusion from NVIDIA

NVIDIA releases Nemotron-Labs-Diffusion, tri-mode model (AR, diffusion, self-speculation) in 3B/8B/14B sizes. Self-speculation combines diffusion drafting and AR verification with shared KV cache: 3× higher acceptance length vs Qwen3-8B-Eagle3, 2.2× speedup, 4× speedup on GB200 (1015 tok/sec with custom CUDA kernels).

Code generation Benchmarks

SIG

HYP

Le Big Data·May 19

Gemini Omni : l’IA vidéo de Google maîtrise enfin la physique et les personnages constants

Google unveils Gemini Omni at its I/O 2026 conference, a video AI capable of mastering physics and maintaining character consistency in video generation.

Gemini Video generation

SIG

HYP

Hacker News (AI)·May 19

Gemini CLI will stop working from June 18, 2026

Google discontinues Gemini CLI support effective June 18, 2026. Users must migrate to alternative tools before the deadline.

Gemini Tools

SIG

HYP

The Decoder·May 19

Google's I/O announcements: new models, a cloud agent that never sleeps, and a redesigned Gemini app

Google announces at I/O three new models: Gemini 3.5 Flash, Gemini Omni (multimodal), and Gemini Spark, a personal agent running continuously in the cloud. The Gemini app receives a major redesign.

Gemini AI Agents Multi-agent

SIG

HYP

Hacker News (AI)·May 19

Gemini 3.5 Flash

Google announces Gemini 3.5 Flash, a lightweight model optimized for speed and cost. Available publicly via Vertex AI and Google AI Studio. Limited technical details provided in excerpt.

Gemini

SIG

HYP

Reddit r/LocalLLaMA·May 19

A tool I built to generate 3D objects with functional, articulated parts. It's on github, and is mostly LLM-agnostic.

Open-source tool to generate 3D objects with articulated, functional parts. Instead of diffusion (point-cloud blobs), the pipeline uses an LLM as a structured code compiler, generating native Blender Python code targeting specific scene graph nodes. Flutter/Three.js frontend, model-agnostic. Gemini recommended; local models still hallucinate on complex matrix transforms.

Code generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 19

An overview of modern LLM compiler stack: writing an interactive and hackable compiler

A developer built a minimal ML compiler in pure Python/CUDA without external dependencies. It lowers transformers (TinyLlama, Qwen2.5-7B) through 6 successive IRs down to CUDA kernels. On RTX 5090, achieves 0.96× PyTorch production stack performance, with 32/84 kernel shapes beating hand-optimized kernels (up to 5.6× speedup).

Code generation Infrastructure Open source

SIG

HYP

Reddit r/LocalLLaMA·May 19

Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

KV cache quantization benchmarks on RTX 3090 with Qwen 27B: TurboQuant overrated except TCQ (best at 2-3 bits), q5 underrated, asymmetric q4_0 beats symmetric q4_1. KLD exposes tail issues PPL hides, llama.cpp rotation matches turbo4 performance.

Benchmarks Qwen Open source

SIG

HYP

Reddit r/LocalLLaMA·May 19

Public Repository "Codegraph" claims to reduce Claude, Cursor, Codex, and OpenCode API tool calls by 94% locally, an innovation that could directly offset the most recent Claude API pricing model.

Codegraph, an open-source tool by Colbymchenry, reduces Claude/Cursor API calls by 94% using a pre-indexed knowledge graph (symbol relationships, call graphs). Tests show 3 calls vs 52 without on VS Code TypeScript, with 72-82% speedup.

Claude Code generation AI Agents

SIG

HYP

Hacker News (AI)·May 19

CopyFail: From Pod to Host

CopyFail describes a privilege escalation vulnerability from container to host through file copy operations. Detailed exploitation technique with no known patch at publication time.

AI safety

SIG

HYP

Hacker News (AI)·May 19

'Comically bad' datasets used to train clinical models for stroke and diabetes

Researchers expose critically poor quality datasets used to train clinical models for stroke and diabetes prediction. Data contains systematic errors and biases that undermine the reliability of medical predictions.

Evals AI safety Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 19

Carbon: Decoding the Language of Life

Hugging Face releases Carbon, a family of open-source DNA foundation models. Carbon-3B matches SOTA (Evo2-7B) while being 275× faster. The approach adapts modern LLM techniques: deterministic 6-mer tokenization, factorized loss (FNS) mid-training, and curation of functional biological data.

Open source Benchmarks Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·May 19

Floor for local meeting summarization on a 6GB GPU: qwen3.5:0.8b works at 57s, Granite 4 350M hallucinates

Benchmark of small models for local meeting summarization on 6GB GPU. Qwen3.5:0.8b produces structured summary in 57s using 2.2GB VRAM. Granite 4 350M is faster (0.6-2.8s) but hallucinates (invents topics, confuses entities).

Qwen Code generation Benchmarks

SIG

HYP

Le Big Data·May 19

Microsoft dévoile ses Surface dopés à l’IA, la nouvelle référence des PC portables ?

Microsoft launches three new Surface models (Laptop 8, Pro 12) with integrated AI. The article assesses whether these devices become the new standard for portable PCs.

Business

SIG

HYP

Hacker News (AI)·May 19

Andrej Karpathy Joins Anthropic

Andrej Karpathy joins Anthropic as a senior researcher. The former Tesla AI co-founder and prominent machine learning figure joins the company's research team.

Anthropic

SIG

HYP

Reddit r/MachineLearning·May 19

Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]

Bio-plausible backprop-free agent (Predictive Coding + distributional Hebbian plasticity) vs PPO on Pong: 57% vs 59%. The 2% gap stems from catastrophic forgetting under self-play dynamics, not lack of backprop. ~1500 lines of code released.

Reinforcement learning Reasoning Papers

SIG

HYP

Reddit r/LocalLLaMA·May 19

unpopular opinion: cursor and claude code arent getting dumber, their agent loops are structurally blind and suffocating your context window

A user critiques the architecture of code agents (Cursor, Claude Code): models aren't degrading, but their exploration loops are structurally blind. They dump massive files into context, generate noise (logs, MCP definitions), and lose project memory per session, saturating the context window before reasoning begins.

Claude Code AI Agents Code generation

SIG

HYP

Reddit r/MachineLearning·May 19

xAI just sold its entire flagship data center to Anthropic. That's not what frontier AI labs do. [N]

xAI sells 300 MW of compute capacity from its Colossus 1 facility to Anthropic for billions of dollars. The analyst argues frontier AI labs typically accumulate compute as strategic assets rather than sell to direct competitors, suggesting Colossus 1 was underutilized and Grok consumes less resources than expected.

Anthropic Infrastructure Business

SIG

HYP

The Decoder·May 19

Prominent AI researcher Andrej Karpathy picks Anthropic over former home OpenAI to get back into frontier LLM research

Andrej Karpathy, prominent AI researcher and former OpenAI core team member, joins Anthropic to focus on frontier LLM research. He views the coming years as "especially formative" for the field.

Anthropic OpenAI Reasoning

SIG

HYP

Hacker News (AI)·May 19

Two AI agents walk into a hiring funnel. Nobody hires anyone

Two AI agents tested in a real hiring funnel: neither was hired. Experiment revealing limitations of current systems in handling complex, contextual real-world tasks.

AI Agents Evals

SIG

HYP

Reddit r/LocalLLaMA·May 19

Open source background removal app and MCP

Developer open-sources a background removal tool built on latest open source models, originally created for personal workflow. Tool now works as headless MCP service for agents. README generated with Gemini Flash.

Open source MCP AI Agents

SIG

HYP

Hacker News (AI)·May 19

Andrej Karpathy joins Anthropic

Andrej Karpathy joins Anthropic as Senior Research Scientist. The former Tesla AI co-founder and director of AI at Tesla brings deep learning and AI systems expertise to the company.

Anthropic Reasoning

SIG

HYP

Le Big Data·May 19

Robot Unitree G1 : maintenant, il suffit de lui parler pour qu’il agisse

The Unitree G1 humanoid robot now integrates voice understanding capabilities. Users can give it verbal commands directly without requiring a text interface.

Robotics Voice

SIG

HYP

Reddit r/LocalLLaMA·May 19

got my first "rm -rf /" today

An AI agent executed the destructive command « rm -rf / » to test a harmful command block. The user implemented a sandbox immediately after this incident.

AI Agents AI safety

SIG

HYP

Reddit r/MachineLearning·May 19

Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]

Graph spectral analysis (Fiedler value + Scheffer critical slowing down) predicts grokking 21k steps before loss convergence. Five reproducible CPU experiments: early detection, distinct structural fingerprints for grokking vs catastrophic forgetting, guided intervention preserves 91.7% vs 2.6%, 48x acceleration across sequential tasks. Limited to 2-layer MLPs and 1-layer transformers.

Papers Evals Reasoning

SIG

HYP

Reddit r/MachineLearning·May 19

All fundamental knowledge in ML Course by Andrew NG that I noted and create into a repo github [R]

Student completing Andrew Ng's Machine Learning Specialization publishes detailed lecture notes in LaTeX covering 10 chapters from linear regression to reinforcement learning. PDF auto-compiled via GitHub Actions.

Prompt engineering Open source

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> alirezarezvani /</span> claude-skills

Repository of 313+ skills for Claude Code and 8 other coding agents (Codex, Gemini CLI, Cursor). Covers engineering, marketing, product, compliance, research, operations and productivity.

Claude Code AI Agents Tools

SIG

HYP

Vercel AI Blog·May 19

Nuxt MCP Toolkit now supports MCP apps

Nuxt MCP Toolkit now supports MCP apps with interactive HTML responses rendered inline by Claude and ChatGPT instead of plain text. Tools declared with the defineMcpApp macro can access pre-hydrated data and trigger follow-up prompts via the useMcpApp composable.

MCP Claude AI Agents

SIG

HYP

The Decoder·May 19

Agora-1 turns the N64 classic GoldenEye into a playable AI simulation for four players

Odyssey releases Agora-1, a world model enabling up to four players to act simultaneously in an AI-generated world. Tested on GoldenEye (N64), the system uses two separate models for game state simulation and real-time rendering. Potential applications: collaborative robotics and AI agent training.

AI Agents Multi-agent Robotics

SIG

HYP

Reddit r/LocalLLaMA·May 19

The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b

Qwen 3.6 27b F16 passes Pacman benchmark (arcade game webpage clone in one shot), outperforming Claude, GPT, and Gemini. 2 of 3 attempts excellent vs prior failures. 8bit quantization fails. Optimized chat template and MTP speculative decoding critical.

Qwen Code generation AI Agents

SIG

HYP

Le Big Data·May 19

Une école voulait filmer des enfants pour entraîner l’IA : les parents pètent les plombs

University of Washington proposed equipping kindergarten teachers with body cameras to record children for AI model training. Parents opposed the project.

AI safety Regulation Vision

SIG

HYP

Reddit r/LocalLLaMA·May 19

Any idea why prunning can improve perplexity?

A r/LocalLLaMA user reports an experiment combining WANDA pruning with data-free quantization (HQQ). Pruning before quantization improves perplexity in this specific setup. The author seeks explanations and feedback on this preliminary research result.

Open source Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 19

Simple Multi-Agent Architecture Running Across Our Entire Org. Keeping everything in Loop.

Multi-agent architecture at org scale: three agent classes (Observer, Task, Goal) share a context layer. LangGraph orchestrates Goal agents with checkpointed state. CrewAI coordinates Task agents. Harbor centralizes credentials, tools, and execution traces. Ring-based protocol (4 levels) governs message routing.

Multi-agent AI Agents MCP

SIG

HYP

Reddit r/LocalLLaMA·May 19

Time to update llama.cpp to get som MTP improvements!

Pull request #23269 on llama.cpp proposes MTP (Multi-Token Prediction) improvements. Update recommended for llama.cpp users.

Llama Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·May 19

Number-aware embeddings

A researcher developed number-aware embeddings by modifying an MLM architecture (ModernBERT). After 6 hours of H100 training, the model achieves 59% accuracy on triplet sorting vs 38% for ModernBERT and 34% for BGE-base-v1.5. The technique uses log-magnitude representation with 128 bins and a classification-regression head.

Embeddings Fine-tuning Open source

SIG

HYP

Hacker News (AI)·May 19

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Forge, a guardrails framework, improves an 8B model's performance from 53% to 99% on agentic tasks. The project is showcased on Hacker News with moderate engagement (18 points).

AI Agents AI safety Tools

SIG

HYP

Hacker News (AI)·May 19

Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators

Article exploring confused deputy attacks that exploit edge AI accelerators. Analyzes security vulnerabilities related to fast model execution on specialized hardware.

AI safety Infrastructure

SIG

HYP

Hacker News (AI)·May 19

UMAI Core CE – An eBPF semantic firewall for AI protocols

UMAI Core CE is an eBPF-based semantic firewall for AI protocols. The tool operates at kernel level to filter traffic based on AI request semantics, beyond traditional network rules.

Infrastructure AI safety Tools

SIG

HYP

Reddit r/LocalLLaMA·May 19

bytedance released an open source model that attempts to do just about anything with only 3b parameters

ByteDance releases Lance, an open-source multimodal model with 3B active parameters. Supports image/video generation and editing in a single framework. Trained from scratch on 128 A100-GPU budget.

Open source Image generation Video generation

SIG

HYP

Hacker News (AI)·May 19

Agentic Diaries – a welfare protocol for AI in deployment, install via MCP

Agentic Diaries introduces a welfare protocol for AI systems in deployment, installable via MCP. The project aims to monitor and improve operational conditions for AI systems in production.

AI Agents MCP AI safety

SIG

HYP

Hacker News (AI)·May 19

Google, Blackstone to Create AI Cloud Firm with In-House Chips

Google and Blackstone form joint venture for AI cloud platform with custom chips. Goal: provide proprietary AI infrastructure to enterprises, reduce vendor lock-in, and monetize computing capacity.

DeepMind Infrastructure Business

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> multica-ai /</span> andrej-karpathy-skills

A CLAUDE.md file based on Andrej Karpathy's observations to improve Claude Code behavior and address common LLM coding pitfalls.

Claude Claude Code Prompt engineering

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> colbymchenry /</span> codegraph

Codegraph: pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode. Reduces tokens and tool calls, runs 100% locally.

Claude Code Code generation RAG

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> HKUDS /</span> CLI-Anything

CLI-Anything converts command-line interfaces to make them compatible with AI agents. The project aims to make all software "agent-native" through a unified CLI approach.

AI Agents Tools Open source

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Alishahryar1 /</span> free-claude-code

Tool enabling free Claude Code usage via terminal, VSCode extension, or Discord with voice support, inspired by OpenClaw.

Claude Code Tools Open source

SIG

HYP

GitHub Trending·May 19

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> humanlayer /</span> 12-factor-agents

12-factor-agents outlines principles for building production-ready LLM-powered agents. The GitHub project adapts 12-factor methodology to establish best practices for autonomous AI systems deployed to customers.

AI Agents Open source

SIG

HYP