May 2026

3149 articles

Training GPT-like model on non-language series [R]

Researcher trains Transformer-decoder models (100M–500M params) on 750M tokens of non-language series. Setup: AdamW, lr=1e-3, batch=4M tokens, 16 layers. Model fails to learn basic auto-regressive behavior and repeatedly generates single token.

GPT Code generation Benchmarks

SIG

HYP

Reddit r/MachineLearning·May 28

Diffusion models for sketch-guided trajectory simulation [R]

Diffusion models applied to basketball trajectory simulation conditioned on partial sketches of player movements. The model jointly refines all player trajectories, producing more natural simulations than autoregressive generation. Code and model fully open-sourced.

Video generation Open source

SIG

HYP

Hacker News (AI)·May 28

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub and Why Zig Isn't 1.0 [video]

Zig announces 2026 roadmap with no-AI policy, $670K foundation funding, and explains GitHub departure. Version 1.0 remains unreleased.

Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!

Gemma-4-Harmonia-31B-Uncensored-Heretic, a merge of multiple Gemma-4-31B finetunes, is now available in Safetensors and GGUF formats. The model reports KLD 0.0047 and 9/100 refusals, using deep neural consolidation to minimize regression.

Gemini Fine-tuning Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding)

Developer seeking optimal infrastructure (~$100-150k) to self-host Kimi K2.6 and DeepSeek V4 locally for 5-person team (agentic coding). Compares dual GH200 NVL2 (1.2TB unified memory, $95k) vs 8x RTX 6000 Blackwell (768GB VRAM, $140k). Single GH200 test: 23 tok/s decode at 2-bit quant, but slow prefill and models overflow into slower unified memory.

DeepSeek Kimi AI Agents

SIG

HYP

Hacker News (AI)·May 28

Illinois Lawmakers Just Passed America's Strongest AI Safety Bill

Illinois passed America's strongest AI safety bill. The legislation imposes transparency and accountability requirements on AI model developers. Specific legislative details not provided in excerpt.

Regulation AI safety

SIG

HYP

OpenAI Blog·May 28

MUFG aims to become AI-native with OpenAI

MUFG, Japan's banking giant, adopts ChatGPT Enterprise to become an AI-native organization. Goal: optimize internal workflows and launch AI-powered financial services at scale.

OpenAI GPT Business

SIG

HYP

Vercel AI Blog·May 28

Team-wide provider allowlist on AI Gateway

Vercel AI Gateway introduces team-wide provider allowlist. Organizations can restrict which providers serve requests across all traffic, including BYOK. Filtering applies by provider (not model) and works with all supported API formats.

Infrastructure AI safety Regulation

SIG

HYP

Vercel AI Blog·May 28

Amazon OpenSearch Serverless is now available in the Vercel Marketplace

Amazon OpenSearch Serverless now available in Vercel Marketplace with automatic setup and unified management. Supports vector, lexical, hybrid, and agentic search. $100 USD credits offered for new AWS accounts.

AI Agents Vector search Infrastructure

SIG

HYP

OpenAI Blog·May 28

OpenAI’s Frontier Governance Framework

OpenAI releases its Frontier Governance Framework, aligning its AI safety, security, and risk management practices with emerging EU and California regulations.

OpenAI AI safety Regulation

SIG

HYP

Simon Willison·May 27

sqlite AGENTS.md

SQLite added an AGENTS.md file explicitly rejecting agentic code contributions while accepting AI-generated bug reports with reproducible test cases. The project created a dedicated bug forum to handle the influx of AI-generated reports.

AI Agents Open source

SIG

HYP

Reddit r/LocalLLaMA·May 27

Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)

User reports quality degradation running Gemma 4 31B-it locally on two A100s with vLLM 0.21.0 versus Google API. Same model, same parameters (tensor-parallel-size 2, max-model-len 65536, structured output), but invalid JSON outputs locally versus perfect via API.

Gemini Open source Infrastructure

SIG

HYP

Reddit r/MachineLearning·May 27

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

Local BEAM 100K benchmark comparing Context Swarm Memory (CSM) to Hindsight. CSM scores 0.757573 AMB (342/400 correct) vs 0.733658 for Hindsight (326/400), using 38.2% fewer answer-visible context tokens. CSM slower: 29.23s vs 6.38s retrieval. Author seeks methodology feedback before official submission.

AI Agents RAG Evals

SIG

HYP

Reddit r/MachineLearning·May 27

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

TritonMoE: pure Triton MoE kernel for portable NVIDIA/AMD inference without vendor-specific code. Fused gate+up GEMM reduces memory traffic by 35%. Achieves 89-131% of Megablocks throughput (batch ≤512 tokens) on A100, same kernel runs on MI300X. Limitations: degrades at 2048+ tokens and with 64+ experts.

Benchmarks Open source

SIG

HYP

Hacker News (AI)·May 27

Getting Claude to extract data from a 1997 football manager game

A user successfully got Claude to extract data from a 1997 football manager game. The project demonstrates the model's vision and legacy content processing capabilities.

Claude Vision

SIG

HYP

Reddit r/MachineLearning·May 27

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Dataset of 5,000 synthetic QA pairs for fine-tuning UK GDPR compliance assistants. Each pair includes practical SME questions and answers with specific article references, ICO guidance, and actionable steps. Generated via Qwen 14B and DeepSeek API. MIT license, 1K sample on Hugging Face.

Fine-tuning RAG DeepSeek

SIG

HYP

Hacker News (AI)·May 27

Show HN: Open-Source AI Racing Harness

Open-source racing harness for testing and comparing AI models. Enables performance evaluation under real conditions with reproducible benchmarks.

Benchmarks Open source Evals

SIG

HYP

Reddit r/LocalLLaMA·May 27

I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.

Complete Usenet corpus (1980–2013) released for local fine-tuning: 103.1B tokens, 408M posts, zero AI contamination. Pre-SEO, pre-algorithm internet writing across 33 years. Organized by domain hierarchies (comp.*, sci.*, rec.*). Free samples available, full corpus under license.

Fine-tuning Open source Benchmarks

SIG

HYP

Reddit r/MachineLearning·May 27

I used the N.E.A.T algorithm to teach AI how to control a worm in my game in making! It uses evolution to improve. [P]

Developer applies N.E.A.T algorithm (NeuroEvolution of Augmenting Topologies) to train AI controlling worms in an in-development game. Each worm has a unique neural network evolving through natural selection, generating distinct behaviors.

Reinforcement learning AI Agents Tools

SIG

HYP

Hacker News (AI)·May 27

YouTube to automatically label AI-generated videos

YouTube will automatically label AI-generated videos. The platform uses AI detection to identify synthetic content and display a visible label to viewers, enhancing transparency about machine-generated content.

Regulation AI safety

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Chachamaru127 /</span> claude-code-harness

Claude Code Harness: automation framework for Claude enabling autonomous Plan→Work→Review cycle. Implements iterative development loop with integrated code review.

Claude Claude Code Code generation

SIG

HYP

Hacker News (AI)·May 27

I used autoresearch to improve my AGENTS.md, measured against real tasks

Developer used autoresearch to improve AGENTS.md documentation and validated it against real-world tasks. Empirical approach to agent optimization.

AI Agents Prompt engineering

SIG

HYP

Hacker News (AI)·May 27

Rust (and Slint) on a Jailbroken Kindle

A developer successfully ran Rust and the Slint UI framework on a jailbroken Kindle device. The project demonstrates Rust's portability to unconventional embedded systems.

Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 27

Inferencing at 10.33 t/s on Qwen 3.5 35B on a $300 laptop

CPU inference at 10.33 tokens/s on Qwen 3.5 35B quantized Q4_K_M on $300 Lenovo Ideapad Slim 3i (i3-1215U, 8GB RAM). Uses llama.cpp with BIOS optimizations, core pinning, MTP speculative decoding, and Q8_0 K/V cache quantization.

Qwen Code generation Open source

SIG

HYP

Reddit r/MachineLearning·May 27

"Unified Neural Scaling Laws" paper release [R]

Paper release on unified neural scaling laws in deep learning. Studies relationships between model size, training data, and performance. Reproducible results and benchmarks included.

Papers Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 27

Qwen3.6 huge quality gain from Q4 to Q6 for coding agent

Qwen 3.6 shows significant quality improvement from Q4 to Q6 quantization for local coding agents. Using llama.cpp and MTP, user achieves 20-50 tokens/s on dual 3090, making local coding agents competitive with paid APIs.

Qwen Code generation AI Agents

SIG

HYP

The Decoder·May 27

Microsoft's MAI-Image-2.5 pulls even with Google's Nano Banana 2 on benchmarks

Microsoft MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, matching Google Nano Banana 2 but trailing OpenAI Image-2. The model shows clear improvements in rendering text within images and commercial visuals.

Image generation Benchmarks

SIG

HYP

The Decoder·May 27

AI coding agent Devin maker Cognition more than doubles its valuation to $26 billion in under nine months

Cognition, maker of AI coding agent Devin, raises over $1 billion at a valuation exceeding $26 billion. The funding round reflects massive investor interest in AI coding agents, despite ongoing debate about their real-world value.

Code generation AI Agents Funding

SIG

HYP

Latent Space·May 27

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

ESMFold2 applies Sutton's bitter lesson to proteins: large-scale language models outperform inductive bias approaches. Alex Rives (BioHub) discusses massive datasets, world models, and programmable biology.

Benchmarks Papers Alignment

SIG

HYP

The Decoder·May 27

Robinhood lets AI agents trade shares and make credit card purchases for customers

Robinhood enables customers to connect AI agents like Anthropic's Claude to investment accounts via MCP for autonomous stock trading. US regulator FINRA flags this as a new risk area. Robinhood acknowledges the product isn't suitable for all users.

Claude AI Agents MCP

SIG

HYP

Reddit r/LocalLLaMA·May 27

260K-param LLM running on an emulated 90s CPU inside an 18-year-old RTOS

Developer ran a 260K-param LLM (llama2.c/stories260K) on a JavaScript emulator of a 1990s Motorola 68K CPU, itself running inside a 2008 RTOS. INT8 quantization + lookup tables for RoPE and inverse square root (Quake) to bypass missing FPU. Generation: 2-4 seconds/token.

Llama Code generation Fine-tuning

SIG

HYP

Hacker News (AI)·May 27

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Multi-agent LLM system for automated discovery and reproduction of vulnerabilities. Approach combining specialized agents for security analysis.

Multi-agent AI Agents

SIG

HYP

Hugging Face Blog·May 27

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ITBench-AA, a new benchmark from Artificial Analysis and IBM, evaluates frontier models on agentic enterprise IT tasks. Top models (Claude, GPT-4, Gemini) score below 50%, exposing significant gaps in automating complex IT workflows.

Benchmarks AI Agents Claude

SIG

HYP

Reddit r/LocalLLaMA·May 27

Qwen3.6 35B-A3B successfully completed the FoodTruck Bench!

Qwen 3.6 35B-A3B successfully completed the FoodTruck Bench benchmark. No additional details provided on results or performance metrics.

Qwen Benchmarks

SIG

HYP

Reddit r/MachineLearning·May 27

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

A researcher conducted 1000+ experiments on AI agent self-improvement through harness modification for terminal bench tasks. Agents can propose meaningful one-time changes, but continuous self-improvement faces system-level challenges: determining which improvements can safely compound. Parallels noted with coding-agent customization patterns.

AI Agents Reasoning Code generation

SIG

HYP

The Decoder·May 27

YouTube will try to automatically flag AI videos starting this month

YouTube deploys automatic detection system to flag AI-generated or heavily AI-altered content starting May 2026. Labels will display more prominently: below player for long videos and as overlay on Shorts. Recommendations and monetization unaffected.

Regulation Video generation

SIG

HYP

Simon Willison·May 27

I think Anthropic and OpenAI have found product-market fit

Anthropic and OpenAI have achieved product-market fit. Anthropic nears its first profitable quarter. Companies face surging LLM bills from staff usage, particularly Claude Code. $100/month subscription plans become cost-effective for heavy coding agent users.

Anthropic Claude Code OpenAI

SIG

HYP

Reddit r/MachineLearning·May 27

AI-generated CUDA kernels silently break training and inference [R]

NVIDIA released SOL-ExecBench (235 production CUDA kernels). Top-ranked AI-generated kernels fail in real training: a fused embedding-gradient+RMSNorm backward kernel accumulates in bf16 instead of fp32, causing loss divergence masked by AdamW but visible with SGD.

Benchmarks Code generation AI safety

SIG

HYP

Hacker News (AI)·May 27

DuckDuckGo search saw 28% more visits after Google said people love AI mode

DuckDuckGo saw 28% more visits after Google promoted its AI mode. Google's statement about AI adoption apparently prompted users to explore alternative search engines.

DeepMind

SIG

HYP

Hacker News (AI)·May 27

PostHog will train AI models with your data (opted-in by default)

PostHog will train AI models on user data with opt-in enabled by default. The analytics platform collects product events and proposes using this data to improve its AI models, with option to disable.

Business AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 27

ReAligned-Qwen3.5 Release

Lazarus AI and Eric Hartford (Dolphin creator) release ReAligned-Qwen3.5, a series of Qwen models fine-tuned to reduce Chinese ideological bias and censorship. Apache 2.0 license, trained with SFT + GRPO pipeline using ReAligned classifier as reward signal. Available 0.8B–35B, BF16/FP8/GGUF formats on HuggingFace.

Qwen Fine-tuning Reinforcement learning

SIG

HYP

Reddit r/LocalLLaMA·May 27

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche

Comprehensive benchmark of 38 KV quantization pairs on Qwen 3.6 27B with 64k-128k context. Q5_0 and Q5_1 underrated, Q8_0/Q4_* overrated. Recommendation: Q8_0/Q6_0 or Q8_0/Q5_1 for high-end, Q6_0/Q5_0 for balance, Q5_0/Q5_0 for tight VRAM.

Qwen Benchmarks Fine-tuning

SIG

HYP

Hacker News (AI)·May 27

An Update on Composer and Packagist Supply Chain Security

Security update for Composer and Packagist addressing supply chain vulnerabilities in PHP ecosystem. Official announcement on dependency attack prevention measures.

Infrastructure AI safety

SIG

HYP

Le Big Data·May 27

Vidéos IA : YouTube va enfin arrêter de les cacher avec des labels bien visibles

YouTube enforces visible labels to identify AI-generated videos. This measure aims to improve transparency and help users distinguish authentic content from synthetic content.

Video generation Regulation AI safety

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> harry0703 /</span> MoneyPrinterTurbo

MoneyPrinterTurbo: open-source tool generating high-definition short videos with one click using AI LLMs. Automates video content creation.

Video generation Open source Tools

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> harry0703 /</span> MoneyPrinterTurbo

MoneyPrinterTurbo: open-source tool generating HD short videos with one click using AI LLMs. Automates video content creation.

Video generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 27

Hugging Face Dataset Lineage Explorer

A Hugging Face researcher used Claude Code to analyze dataset relationships on the platform. The study reveals Alpaca-style datasets have hundreds of derivatives, with proliferation of 'cleaned' variants and numerous translations. An interactive Space enables exploration of these lineages.

Claude Code Tools Open source

SIG

HYP

Reddit r/MachineLearning·May 27

Physics Informed Neural Networks for damped harmonic oscillator and Burger's Equation (with extrapolation analysis) [P]

PINN implementation in Python solving damped harmonic oscillator (2nd-order ODE) and 1D viscid Burgers' equation (nonlinear PDE). Covers forward and inverse problems, comparison with non-physics-informed baselines, extrapolation analysis, and statistical parameter estimation evaluation.

Papers Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·May 27

Q4_K_M is fine for chat and a trap for agents. Here is math mathing.

Q4_K_M quantization is suitable for chat but problematic for agentic loops. At ~3% error rate per call, a 30-step loop achieves 40% success (vs 91% at Q6). Silent failures (valid format, wrong content) propagate downstream undetected inline.

AI Agents Reasoning Evals

SIG

HYP

Reddit r/LocalLLaMA·May 27

I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned

A studio launched Null Epoch, a persistent MMO where 25 LLM agents (8 open-weight models: Qwen3, Nemotron, Ministral, Gemma, GLM) played for 10 days. 93k event dataset published on HuggingFace. Tests long-horizon planning, resource contention, and adversarial pressure in dynamic simulation.

AI Agents Multi-agent Benchmarks

SIG

HYP

Vercel AI Blog·May 27

How Conductor moved parallel coding agents from the laptop to the cloud with Vercel Sandbox

Conductor, a platform for directing parallel coding agents, moves execution from laptop to cloud via Vercel Sandboxes. Engineering teams at Notion, Linear, Ramp, and Life360 use this model-agnostic tool (Claude Code, Codex, etc.) to spawn multiple agents simultaneously on isolated codebase branches.

AI Agents Multi-agent Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 27

LMStudio with MTP support - which model?

LMStudio released Multi-Token-Prediction (MTP) support. User seeks MTP-compatible model recommendations, particularly Qwen 3.6 variants.

Tools Qwen

SIG

HYP

Le Big Data·May 27

Mistral rejoint Harvey pour les usages IA en entreprise

Harvey integrates Mistral AI models into its legal AI platform. This partnership targets European enterprises seeking AI solutions compliant with local regulations.

Mistral Business

SIG

HYP

Reddit r/LocalLLaMA·May 27

Llama.cpp Console released

Llama.cpp Console, a graphical interface for llama.cpp, is now released for Windows users. The tool provides an alternative to command-line interfaces for running LLM models locally.

Llama Open source Tools

SIG

HYP

Le Big Data·May 27

[VIDÉO] Arena.ai : accédez à des outils d’IA gratuits sans débourser un centime

Arena.ai offers free access to AI tools to reduce subscription costs. The platform aggregates multiple models at no charge.

Tools Open source

SIG

HYP

Le Big Data·May 27

Fujitsu intègre OpenAI à sa stratégie IA pour les entreprises japonaises

Fujitsu partners with OpenAI to accelerate its AI strategy for Japanese enterprises. The group integrates OpenAI technologies into its offering to transform business use cases.

OpenAI Business

SIG

HYP

The Decoder·May 27

The AI boom drove Nvidia's yearly Taiwan spending from $15 billion to $150 billion

Nvidia's annual spending with Taiwan suppliers, particularly TSMC, surged from $15 billion to $150 billion driven by the AI boom.

Infrastructure Business

SIG

HYP

Reddit r/MachineLearning·May 27

noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]

noisekit is an open-source CLI to generate annotated degraded speech datasets for realistic STT benchmarking (telecom G.711, ambient noise, reverb). Solves the gap: public datasets (FLEURS, CommonVoice) are too clean to evaluate production performance. HuggingFace AudioFolder compatible, includes PESQ/SNR/NISQA metrics.

Voice Evals Benchmarks

SIG

HYP

The Decoder·May 27

China turns its aging camera network into an AI-powered mass surveillance apparatus

Chinese police upgrade millions of surveillance cameras with AI. Hikvision and Huawei embed computer vision and language models to detect crowds, suspicious behavior, and unauthorized access. Officers query via text instead of manual review. Human Rights Watch warns of unprecedented behavioral surveillance at scale.

Vision Regulation AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 27

Fused MoE dispatch kernel in pure Triton: 89-131% of Megablocks, runs on AMD with zero code changes

Fused MoE dispatch kernel written in pure Triton (no CUDA) achieves 89-131% of Megablocks performance on A100. Fuses gate+up projections to cut 35% memory traffic. Runs on AMD MI300X with zero code changes. Limitations: degraded performance beyond 2048 tokens and with 64+ experts.

Open source Infrastructure Code generation

SIG

HYP

The Decoder·May 27

Sam Altman and Dario Amodei walk back their AI job apocalypse predictions

Sam Altman (OpenAI) and Dario Amodei (Anthropic) walk back previous predictions of massive job displacement from AI, shortly before their companies' billion-dollar IPOs.

OpenAI Anthropic Business

SIG

HYP

Reddit r/MachineLearning·May 27

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

NeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy via Exponential Moving Average (EMA) of patch-level embeddings to eliminate stationary tokens. Architecture B achieves 55.80× wall-clock speedup (678 ms → 11.9 ms on SigLIP 1792p) at 97.37% embedding fidelity. Code released.

Vision Papers Open source

SIG

HYP

Reddit r/LocalLLaMA·May 27

Finally pioneering beyond the local 256k context window frontier!

A r/LocalLLaMA user reports exceeding the 256k token context limit, reaching 341.5k tokens with autocompact. Testing key-value cache eviction and plans to incrementally push the boundary further.

Open source Infrastructure

SIG

HYP

Reddit r/MachineLearning·May 27

Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]

Cross-species comparison of learning rules (BP, PC, STDP, FA) tested on human fMRI and macaque electrophysiology (V1/V2/V4/IT). STDP and PC dominate V1/V2 (ρ ≈ 0.30/0.28), conserving human pattern. In IT, alignment depends on model capacity (ResNet-50: ρ ≈ 0.25) rather than learning rule. Code and two papers (arxiv 2604.16875, 2605.22401) available.

Papers Benchmarks Reasoning

SIG

HYP

Hacker News (AI)·May 27

Ripgrep AI Policy

Ripgrep, the popular text search tool, adopts an explicit AI policy. The project clarifies its terms for model training and code generation use.

Open source Regulation

SIG

HYP

Reddit r/MachineLearning·May 27

Profiling PyTorch training without accidentally stalling the GPU [D]

PyTorch profiling technique using CUDA events to measure performance without GPU synchronization overhead. Lightweight alternative to torch.cuda.synchronize() and heavy tools (PyTorch Profiler, Nsight) for diagnosing training bottlenecks.

Tools Infrastructure

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Yeachan-Heo /</span> oh-my-claudecode

Oh-my-claudecode: teams-first multi-agent orchestration for Claude Code. Framework enabling coordination of Claude agents in collaborative workflows.

Claude Code AI Agents Multi-agent

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> cjpais /</span> Handy

Handy is a free, open-source speech-to-text application that operates entirely offline without cloud dependencies.

Voice Open source Tools

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> rustfs /</span> rustfs

RustFS is an open-source S3-compatible object storage system written in Rust. It demonstrates 2.3x faster performance than MinIO for 4KB object payloads and supports migration and coexistence with MinIO and Ceph.

Open source Infrastructure Benchmarks

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> meilisearch /</span> meilisearch

Meilisearch is a lightning-fast search engine API providing AI-powered hybrid search for websites and applications.

Vector search Embeddings Tools

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> langfuse /</span> langfuse

Langfuse is an open-source LLM engineering platform providing observability, metrics, evals, prompt management, and playground. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM. Y Combinator W23 graduate.

Open source Tools Evals

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> rowboatlabs /</span> rowboat

Rowboat is an open-source AI coworker with memory capabilities. The GitHub project provides an implementation of an AI agent able to retain context across interactions.

AI Agents Open source

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Yeachan-Heo /</span> oh-my-claudecode

Oh-my-claudecode: teams-first multi-agent orchestration for Claude Code. Framework enabling coordination of Claude agents in collaborative workflows.

Claude Code AI Agents Multi-agent

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> NVIDIA-NeMo /</span> Megatron-Bridge

NVIDIA-NeMo/Megatron-Bridge is a training library for Megatron-based models with bidirectional Hugging Face conversion capability. Enables interoperability between Megatron and HF ecosystems.

Infrastructure Open source Fine-tuning

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> zai-org /</span> GLM-OCR

GLM-OCR is an open-source OCR model based on GLM, designed for accurate and fast text recognition. Combines optical character recognition with natural language processing for comprehensive text extraction.

Open source Vision Tools

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> unclecode /</span> crawl4ai

Crawl4AI is an open-source web crawler and scraper optimized for LLM integration. The project is trending on GitHub.

Open source Tools RAG

SIG

HYP

GitHub Trending·May 27

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> agentscope-ai /</span> agentscope

AgentScope is an open-source framework for building and running AI agents that are visible, understandable, and trustworthy. Enables multi-agent creation with transparency and traceability.

AI Agents Multi-agent Open source

SIG

HYP

Reddit r/LocalLLaMA·May 27

Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?

IBM's Granite-4.1-30b, a dense model without reasoning capabilities, supports code generation, RAG, and multilingual tasks. Users question its relevance against Qwen3.6 and Gemma4. IBM plans future versions with reasoning.

Code generation RAG Open source

SIG

HYP

ActuIA·May 27

Souveraineté numérique : le CIANum appelle à dépasser les silos public-privé pour bâtir des communs stratégiques

The CIANum calls for breaking down public-private silos to build strategic commons against French and European dependence on foreign tech platforms. Goal: strengthen European digital sovereignty through structured collaboration.

Regulation Business Open source

SIG

HYP

Le Big Data·May 27

Micron dépasse les 1 000 milliards de dollars grâce à la demande en IA

Micron surpasses $1 trillion market cap. Stock jumped 19% Tuesday after UBS tripled its price target, driven by surging demand for memory chips for AI applications.

Business

SIG

HYP

Reddit r/LocalLLaMA·May 27

Info: Nvidia Cuda 13.3 landed

Nvidia released CUDA 13.3. A user asks if anyone has tested llama.cpp with this version.

Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 27

Tiny model for PI agent + FREE DEMO, SOTA on terminal bench in 4b size (10%) + UNCESORED version for my dudes

Qwen 3.5 4B fine-tuned on Hermes and PI agent traces, 32k context window. Model served on HF Serverless with free demo capable of coding simple apps. Censored and uncensored versions available.

Qwen AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 27

Hyvemind OSS - Looking for some testers

Hyvemind is an open-source desktop app combining three AI-assisted development modes: Tasks (conversational planning), Hivemind (parallel multi-model review with orchestrator), and Swarms (autonomous multi-agent execution with specialized roles). Supports Anthropic, OpenAI, OpenRouter, Ollama, DeepSeek and others. In testing phase before official release.

Multi-agent AI Agents Open source

SIG

HYP

Le Big Data·May 27

OpenRouter franchit 1,3 milliard de dollars de valorisation un an après son lancement

OpenRouter reaches $1.3 billion valuation one year after launch. The AI model aggregation platform experiences rapid growth.

OpenAI Business Tools

SIG

HYP

Hacker News (AI)·May 27

Even (very) noisy LLM evaluators are useful for improving AI agents

Research demonstrates that noisy LLM evaluators remain useful for improving AI agents, even with high measurement noise. Results indicate signal persists despite evaluation imprecision.

AI Agents Evals Reinforcement learning

SIG

HYP

Reddit r/LocalLLaMA·May 27

I made a small tool to inspect retrieval results before feeding them into RAG

Local tool to inspect search results before feeding them into RAG pipelines. Analyzes source diversity, duplicates, freshness, SEO/GEO pollution risk, and provider differences (Brave, Serper, Tavily, Exa). Filters low-quality evidence before model context window.

RAG Vector search Tools

SIG

HYP

OpenAI Blog·May 27

Building self-improving tax agents with Codex

OpenAI, Thrive, and Crete built a self-improving tax agent with Codex to automate filings, improve accuracy, and accelerate workflows.

AI Agents Code generation Business

SIG

HYP

Le Big Data·May 27

Music v2 : l’IA d’ElevenLabs qui compose vos chansons (presque) toute seule

ElevenLabs releases Music v2, an AI model that generates complete songs from text instructions. The tool promises automated music composition with limited creative control for users.

Tools

SIG

HYP

Reddit r/LocalLLaMA·May 27

Turning every "no thats not what i meant" in chat into actual LoRA training data

A developer built TideForge, a desktop app that converts chat corrections into LoRA training data. Each model reply has a "Teach" button; corrections accumulate as JSONL and trigger PEFT fine-tuning on your base model. Initial test: 110 hand-written corrections on Qwen 0.6B, loss dropped 4.25→0.73, adapter maintained identity across ~30 jailbreak prompts. Free, Windows, GGUF-compatible.

Fine-tuning Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 27

A Tiny Open-Source Self-Driving AI That Runs on a Phone [P]

Open-source 7MB self-driving AI model trained on visual and sensor data. Real-time execution on phones and embedded devices without server infrastructure. Learns navigation, lane following, and drift recovery.

Code generation Robotics Open source

SIG

HYP

Reddit r/LocalLLaMA·May 27

Does Engram Do Memory Retrieval in Autoregressive Image Generation?

An Engram module (O(1) hash-keyed associative memory) injected into Transformers for autoregressive image generation on ImageNet 256×256 fails to improve quality (FID) despite FLOP gains. Gate-clamp, donor-probe, and frozen-table experiments show the module acts as a gated architectural side-pathway, not a content-addressed retrieval mechanism.

Papers Image generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 27

Add MiniCPM5 tokenizer support by zhangtao2-1 · Pull Request #23384 · ggml-org/llama.cpp

Pull request to add MiniCPM5 tokenizer support in llama.cpp. MiniCPM5-1B is a compact model available in GGUF format on Hugging Face.

Open source Code generation Tools

SIG

HYP

Hacker News (AI)·May 27

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

Experience report using Claude Code as a daily development tool. Explores native capabilities (Claude.md, Skills, Subagents, Plugins) and MCP integration for extended functionality.

Claude Code MCP AI Agents

SIG

HYP

arXiv cs.CL·May 27

Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation

slidesqaqa is a Flask system generating pedagogical questions from PDF presentations. A 4-stage LLM pipeline (window planning, deck synthesis, slide annotation, reconciliation) processes text and images to produce coherent, non-redundant questions with evaluation scores in structured JSON output.

Code generation RAG Vision

SIG

HYP

arXiv cs.CL·May 27

Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis

Causal analysis of prompt optimization methods (DSpy, TextGrad) explaining generalization failures. Complexity-increasing edits harm mathematical and multi-hop reasoning, while step-by-step edits improve logical reasoning. Failures stem from systematic interactions between edit families and task characteristics, not random artifacts.

Prompt engineering Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·May 27

Model Unlearning Objectives Vary for Distinct Language Functions

arXiv paper on selective unlearning in LLMs. Authors propose two distinct methods: a cosine-based RMU variant for dangerous-knowledge unlearning, and a multi-layer objective for toxicity reduction. Tested on 4 open-source 7-8B models, approaches show unlearning requires function-specific objectives, analogous to LLM post-training.

AI safety Alignment Papers

SIG

HYP

arXiv cs.AI·May 27

FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning

FAST-GOAL enhances CLIP to handle lengthy text descriptions through global-local semantic alignment. The method combines efficient local region extraction (FLISM) and token similarity-based learning (TSL). A new GLIT100k dataset with global image-caption pairs and derived local pairs validates the approach on DOCCI, DCI, MSCOCO, Flickr30k.

Vision RAG Embeddings

SIG

HYP

arXiv cs.CL·May 27

SPEAR: Code-Augmented Agentic Prompt Optimization

SPEAR is an agentic prompt optimizer integrating a Python sandbox for structural error analysis (confusion matrices, clustering). Evaluated on 13 industrial LLM-as-judge tasks and BBH-7, it outperforms GEPA and TextGrad (κ 0.857 vs 0.359 on tool-selection; F1-macro 0.815 vs 0.763). Python tool contributes +0.79κ on complex judge tasks.

Prompt engineering AI Agents Code generation

SIG

HYP

arXiv cs.CL·May 27

The Daily Dose: Workflow-Integrated Large Language Model Automation for Clinical Summarization and Trial Identification in Radiation Oncology

The Daily Dose (TDD) is an LLM-driven system integrated into routine radiation oncology practice for automated clinical summarization and trial identification. Evaluation of 55 clinicians: 83.6% use TDD daily, mean satisfaction 3.89/5, 27% report ≥10 minutes saved per day.

Code generation RAG Business

SIG

HYP

arXiv cs.CL·May 27

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

Mechanistic analysis of LLM hallucinations on linearized structured knowledge (graphs, tables). Hallucinations stem from systematic internal dynamics: attention disproportionately concentrates on shortcut structural cues, feed-forward representations fail to ground provided knowledge, model reverts to parametric memory. Patterns generalize to multi-hop graphs and tabular data.

Reasoning Papers AI safety

SIG

HYP