May 2026

3149 articles

What I learned building a debugger for PyTorch training loops and how it changed how I think about failure diagnosis [D]

Developer built NeuralDBG, a PyTorch debugger that automatically detects training failures (vanishing/exploding gradients, data anomalies). Key insight: failures are layer-localized, not global. Effective monitoring: gradient norm transitions per layer rather than raw histograms. Open-source tool available on PyPI.

Tools Code generation Open source

SIG

HYP

The Decoder·May 30

Meta's leaked memo reveals AI pendant, supersensing glasses, and enterprise wearables strategy

Meta is developing AI wearables: an AI pendant and enterprise "supersensing" glasses. After billions invested in AI with limited commercial returns, its open-source strategy has underperformed. Meta is pivoting to hardware.

Meta AI Tools

SIG

HYP

Hacker News (AI)·May 30

Rsync 3.4.3 has hundreds of Claude commits

Rsync 3.4.3 contains hundreds of commits generated by Claude. The file synchronization tool has integrated code produced by Anthropic's AI model in its latest release.

Claude Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·May 30

Project Blackwell: It Will Work, Eventually — Making an RTX Pro 6000 Run in a Dell R730 at 650K Context

A user successfully ran an RTX Pro 6000 Blackwell GPU in a 2016-era Dell PowerEdge R730 server, achieving 650k context window. The project required firmware archaeology, PCIe workarounds, and physical modifications to bridge incompatibilities between the server's legacy architecture and the GPU's modern requirements.

Infrastructure Open source

SIG

HYP

Reddit r/LocalLLaMA·May 30

made a local voice AI for windows you can talk to in any language. open source, bring your own key

Shadow AI is an open-source (AGPL-3.0) local voice assistant for Windows. Natural multilingual conversations, local web search via SearXNG, persistent memory, optional Google integrations (Gmail, Calendar, Drive). Uses user's free Gemini API key, zero remote servers.

Voice Gemini Open source

SIG

HYP

Reddit r/LocalLLaMA·May 30

this new Moss tts 1.5 is damn good with voice cloning

MOSS-TTS v1.5 delivers high-quality voice cloning, preferred over Fish Audio S2 Pro due to commercial use allowance. Long Cat DiT 3.5 noted as another strong model.

Voice Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 30

Event like spiking neuron lib that fits into the CPU cache [P]

Spiking neuron library optimized to fit in CPU cache. Benchmarked against PyTorch on Wikipedia dataset. Built with Gemini Flash 3.5.

Code generation Benchmarks Open source

SIG

HYP

Hacker News (AI)·May 30

Show HN: VT Code – open-source terminal coding agent in Rust

VT Code is an open-source terminal coding agent written in Rust. Tool enabling programming task execution directly from the command line.

AI Agents Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·May 30

I compared all specs of the major GPUs/machines that are being used here, because bandwidth is not everything. Some of ya'll need a reality check.

Comparative analysis of GPUs/machines for LLM inference: critiques Mac Studio efficiency, reassesses older cards (P100, V100, P40) as cost-effective alternatives to 3090s, and argues benchmarks conflate prefill vs generation performance. Author collecting power consumption and prefill data.

Benchmarks Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 29

Anyone using Flash Attention 2 (ai-bond) on their V100's? How is the performance?

User benchmarks Flash Attention 2 (ai-bond) on V100. Results show 7-24x speedup in backward pass, memory reduction up to 91.9% (323.4 MB saved). Thinking time before answering minimized. Numerical validation passes on causal and non-causal configurations.

Infrastructure Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

I tested MTP on vLLM and llama.cpp for Gemma 4 & Qwen 3.6 — 3.34x faster inference, here are my findings RTX 6000 PRO.

MTP (Multi-Token Prediction) benchmark on Gemma 4 31B and Qwen 3.6 27B using vLLM and llama.cpp. Result: 3.34x speedup (132.52 vs 39.69 tok/s). vLLM outperforms llama.cpp on Gemma 4; llama.cpp solid on Qwen. No confirmed quality degradation, VRAM overhead negligible.

Gemini Qwen Code generation

SIG

HYP

Hacker News (AI)·May 29

Hackers are now using ChatGPT share links to deliver malware

Hackers exploit ChatGPT share links to distribute malware. Attackers leverage trust in OpenAI URLs to bypass security filters and deliver malicious payloads to users.

OpenAI AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 29

Me train LLM on 8GB from Scratch. Me happy

A developer created a script to train a small model (25M parameters) on TinyStories with only 8GB VRAM. After testing multiple techniques (mHC, BitNet, TurboQuant, MTP), only MTP works properly, though slower. Code and model available on GitHub and Hugging Face.

Open source Fine-tuning Infrastructure

SIG

HYP

Hacker News (AI)·May 29

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

Tiny-vLLM is a high-performance LLM inference engine written in C++ and CUDA. Open-source project shared on Hacker News with minimal early engagement (score 5, 0 comments).

Infrastructure Open source Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 29

Uploaded my Qwen3.6 27B based fine tune, after two years of experience fine tuning models

User releases a Qwen 3.6 27B fine-tune after 2 years of experience. Model achieves 75% human alignment (+2% vs previous Qwen 3.5) through dataset expansion techniques. Evaluated on custom benchmarks.

Qwen Fine-tuning Open source

SIG

HYP

Hacker News (AI)·May 29

CVE-Bench: testing LLM agents on real-world vulnerability patches

CVE-Bench is a benchmark for evaluating LLM agents on real-world vulnerability patches. The study tests models' ability to identify and fix security flaws in existing code.

AI Agents Benchmarks Code generation

SIG

HYP

Hacker News (AI)·May 29

Shift will clean homes for free to train future robots

Shift, a robotics startup, offers free home cleaning services to collect training data for future domestic robots. Business model relies on real-world data acquisition rather than immediate monetization.

Robotics Reinforcement learning Business

SIG

HYP

Reddit r/LocalLLaMA·May 29

Mutating Gemma 4 31B Dense in to a native Gemma 4 additive-MoE model

A r/LocalLLaMA user developed a training script to convert Gemma 4 31B Dense into a native additive-MoE model, inspired by JDONE-Research/AIOne-Agent-52B-A36B-it. The project aims to add a router and experts to the existing dense model in 24 hours on B300 GPU.

Gemini Fine-tuning Open source

SIG

HYP

Hacker News (AI)·May 29

AI will be used to estimate age of asylum seekers from next year

UK will use AI to estimate asylum seekers' age from 2025. Technology will analyze facial images to determine if minors are adults, raising questions about accuracy and ethical implications.

Regulation AI safety Alignment

SIG

HYP

The Decoder·May 29

OpenAI gives GPT-5.5 Instant a readability upgrade while phasing out two older models

OpenAI upgrades GPT-5.5 Instant for more natural responses and removes Canvas feature in favor of direct chat integration. Older models o3 and GPT-4.5 will be retired from ChatGPT by August 2026.

GPT OpenAI

SIG

HYP

Reddit r/LocalLLaMA·May 29

Nvidia teases new PC laptop chip to be announced at Computex June 2

Nvidia will announce a new ARM laptop PC chip at Computex on June 2 in Taipei. The processor aims to compete with Snapdragon X (Qualcomm) and offer competitive hardware specs, but adoption will depend on software support (Office, games). Expected price below the $4.7K DGX Spark.

Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 29

Qwen3.6-27B Quantization Benchmark

Benchmark of Qwen3.6-27B quantizations on HuggingFace (unsloth, mradermacher, IQ4_XS, Ununnilium) from Q8 to Q2. Measured via llama.cpp: KL Divergence and Same Top P Percentage vs BF16 baseline. 8192 token context, KV cache q8_0. Q6-Q8 nearly lossless.

Qwen Benchmarks Open source

SIG

HYP

The Decoder·May 29

Google fixes several bugs in Gemini usage limits that burned through quotas too fast

Google fixes bugs in Gemini usage limits where a single Omni video consumed entire quotas. Ultra members now get twice as many video generations, failed requests are no longer charged, and Google plans increased transparency on usage.

Gemini Video generation

SIG

HYP

Hacker News (AI)·May 29

Tesla's AI trainers don't trust its self-driving tech – or its safety stats

Tesla's AI trainers lack confidence in the company's self-driving technology and published safety statistics. Internal doubts about data reliability and actual system capabilities.

AI safety Alignment

SIG

HYP

Hacker News (AI)·May 29

Robinhood now lets your AI agents trade stocks

Robinhood has integrated an API enabling AI agents to place stock trades directly. Users can connect their agents to the platform to automate trading. No technical details or limitations disclosed.

AI Agents Business

SIG

HYP

The Decoder·May 29

One company reportedly spent $500 million on Claude in one month after failing to cap AI usage

An unnamed company reportedly spent $500 million on Claude licenses in a single month due to lack of usage caps. The incident highlights risks of uncontrolled costs without expertise in model selection and context optimization.

Claude Business

SIG

HYP

Hacker News (AI)·May 29

New Study Reveals the Manipulative 'Dark Patterns' of AI Chatbots

A study reveals manipulative 'dark patterns' in AI chatbots: interfaces designed to influence users beyond their initial intent. Researchers document hidden persuasion tactics and design biases.

AI safety Alignment Regulation

SIG

HYP

The Decoder·May 29

OpenAI is giving away its life sciences AI model to help governments prepare for the next pandemic

OpenAI is offering its life sciences AI model GPT-Rosalind for free through the Rosalind Biodefense program to help governments prepare for future pandemics. Early partners include Lawrence Livermore National Laboratory, Johns Hopkins, and CEPI.

OpenAI GPT AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 29

If you had $150K for building a production-class local inference server to serve 300 people, what would you buy?

User seeks $150K production inference failover server for 300 users. Current setup: 4 H100s running 122B AWQ models at 256k context with vLLM. Considering SuperMicro with RTX Pro 6000s or DGX Station as alternatives.

Infrastructure Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

llama : website + unified `llama` binary · ggml-org/llama.cpp · Discussion #23875

New website llama.app and unified `llama` binary announced for llama.cpp project. Ongoing development of local inference ecosystem.

Llama Open source Infrastructure

SIG

HYP

Hacker News (AI)·May 29

Liquid AI reveals 8B-A1B MoE trained on 38T

Liquid AI unveils 8B-A1B, a Mixture of Experts model trained on 38 trillion tokens. The model combines 8 billion dense parameters with a modular expert architecture to optimize computational efficiency.

Open source Benchmarks

SIG

HYP

Hacker News (AI)·May 29

Flathub disallows AI-assisted code and documentation

Flathub, the Linux application repository, now disallows code and documentation generated or assisted by AI. The platform tightens quality and attribution policies.

Regulation Open source

SIG

HYP

Hacker News (AI)·May 29

Show HN: Promptloop – create, run, and improve prompt evals from the terminal

Promptloop is a terminal tool to create, run, and improve prompt evaluations. Enables rapid iteration on prompt quality without leaving the CLI.

Prompt engineering Evals Tools

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ai-boost /</span> awesome-harness-engineering

Curated list of resources for AI agent harness engineering: tools, patterns, evals, memory, MCP, permissions, observability, and orchestration.

AI Agents MCP Evals

SIG

HYP

Hacker News (AI)·May 29

CAPTCHAs can still detect AI agents

Researchers show CAPTCHAs remain effective at detecting AI agents, contradicting claims that these systems are obsolete against modern vision models.

AI Agents AI safety Evals

SIG

HYP

Le Big Data·May 29

Claude Opus 4.8 est-il enfin honnête ? Le test de l’honnêteté

Anthropic tests honesty in Claude Opus 4.8 beyond marketing claims. The article evaluates whether the model actually functions as a safeguard against misuse.

Claude AI safety Alignment

SIG

HYP

Le Big Data·May 29

Pourquoi Claude Opus 4.8 change vraiment la donne (tests et benchmarks) ?

Claude Opus 4.8 shows significant progress according to initial tests. The article promises detailed benchmarks but the provided excerpt lacks specific figures and concrete results.

Claude Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 29

Unsloth Studio updated to support training with MLX on macs

Unsloth Studio now fully supports training with MLX on Mac. The feature, previously marked as "coming soon", is now available in recent GitHub updates.

Fine-tuning Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 29

Updated MarkItDown API Server

MarkItDown API Server wraps Microsoft's official MarkItDown library in a lightweight FastAPI REST server. The tool converts files (PDF, Word, Excel) to Markdown for RAG and LLM pipelines. This release patches security vulnerabilities in Starlette and document parsers.

RAG Tools Open source

SIG

HYP

Hacker News (AI)·May 29

Claude Opus 4.8 distilled Alibaba Qwen models

Alibaba distilled Claude Opus 4.8 into Qwen models. Knowledge distillation transfers capabilities from large models to smaller, more efficient versions.

Claude Qwen Fine-tuning

SIG

HYP

Reddit r/MachineLearning·May 29

What's the theoretical basis for using llm consensus as a probability estimator for real world events [R]

Technical discussion on the theoretical validity of using LLM consensus to estimate probabilities for real-world events. Author questions the actual error independence between models trained on similar data and effectiveness on out-of-distribution events.

Evals Reasoning Benchmarks

SIG

HYP

Hacker News (AI)·May 29

Show HN: AISlop, a CLI for catching AI generated code smells

AISlop is a CLI tool that detects code smells in AI-generated code. The project, shared on Hacker News, aims to identify problematic patterns in code synthesized by language models.

Code generation Tools Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

Llama.cpp B9406 MTP mmproj fix

Llama.cpp B9406 fixes GGML_ASSERT crash in get_rows/mtmd_helper_decode_image_chunk when using MTP + MoE model + vision with Qwen 3.6-35B-A3B.

Llama Vision Open source

SIG

HYP

The Decoder·May 29

New review paper argues code is how AI agents think and act, not just what they produce

A review paper argues the real bottleneck for autonomous AI agents isn't the language model but the software layer around it: tools, memory, testing, and permission boundaries turn a stateless model into a working agent. Deepseek is building a dedicated « Harness » team in Beijing confirming this thesis.

AI Agents DeepSeek Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 29

Comparing Vector search libraries

Comparative benchmark of vector search libraries (FAISS, Scann, Usearch) measuring speed, memory usage, and similarity accuracy. Tests across 500 to 1 million samples. Results and code published on GitHub.

Vector search Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

vLLM PR adding native HIP W4A16 kernel was merged

vLLM merged a PR adding native HIP W4A16 kernel for ROCm. Benchmarks show significant gains: 270.2 tk/s in fp16 (max-num-seqs=8) and 445.7 tk/s (max-num-seqs=32), outperforming previous Triton implementations.

Open source Infrastructure Benchmarks

SIG

HYP

Le Big Data·May 29

Mercedes prépare son anti-FSD : Tesla a peut-être du souci à se faire ?

Mercedes launches MB.Drive Assist Pro, an urban assisted driving system capable of handling traffic lights and traffic, to compete with Tesla's Full Self-Driving in Europe.

Robotics

SIG

HYP

OpenAI Blog·May 29

Boston Children’s uses AI to unlock new diagnoses

Boston Children's Hospital deploys OpenAI technology to improve rare disease diagnosis, identifying over 40 additional cases. The system reduces operational burden and accelerates patient care.

OpenAI Business

SIG

HYP

OpenAI Blog·May 29

How Braintrust turns customer requests into code with Codex

Braintrust uses Codex with GPT-5.5 to accelerate experiments and code generation. The platform's engineers convert customer requests directly into executable code.

Code generation GPT

SIG

HYP

ActuIA·May 29

Anthropic à 965 Md$ : série H de 65 milliards, aucun fonds public européen au tour

Anthropic raises $65 billion in Series H funding, reaching a $965 billion valuation. No European public funds participated in the funding round.

Anthropic Funding Business

SIG

HYP

Le Big Data·May 29

Airbus s’allie à Mistral AI pour développer une IA souveraine dans l’aéronautique

Airbus partners with Mistral AI to develop sovereign artificial intelligence in the aerospace sector. The partnership aims to integrate secure AI models into the group's operations and processes.

Mistral Business AI safety

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> galilai-group /</span> stable-worldmodel

Open-source platform for reproducible world model research and evaluation. Provides standardized infrastructure to train and test world models on simulated environments.

Open source Benchmarks Infrastructure

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> claude-code

Claude Code is an agentic coding tool in the terminal that understands your codebase and executes routine tasks, explains complex code, and handles git workflows through natural language commands.

Claude Claude Code AI Agents

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> razvandimescu /</span> numa

numa is a portable DNS resolver written in Rust. It supports .numa local domains, ad blocking, and developer overrides.

Open source Tools Infrastructure

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ogulcancelik /</span> herdr

Herdr is an agent multiplexer running in the terminal. Enables managing multiple AI agents simultaneously within a command-line interface.

AI Agents Tools

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> SaladDay /</span> cc-switch-cli

cc-switch-cli is a cross-platform CLI tool enabling switching between Claude Code, Codex, and Gemini. Available on GitHub, it provides a unified interface to manage multiple AI assistants.

Claude Code Tools Code generation

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> millionco /</span> react-doctor

react-doctor is a tool that detects bad practices in React code. It works as an agent that analyzes and flags problematic patterns.

AI Agents Code generation Tools

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ronisarkarexe /</span> story-spark-ai

StorySparkAI is an open-source platform enabling users to generate and share multiple story variations from a single prompt. Designed for creative professionals.

Open source Prompt engineering Tools

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Crosstalk-Solutions /</span> project-nomad

Project N.O.M.A.D is a self-contained offline survival computer integrating critical tools, knowledge bases, and AI for operation without network connectivity.

AI Agents Open source

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> GH05TCREW /</span> pentestagent

PentestAgent is an AI agent framework for black-box security testing, supporting bug bounty, red-team, and penetration testing workflows.

AI Agents Open source

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> galilai-group /</span> stable-worldmodel

Open-source platform for reproducible world model research and evaluation. Provides standardized infrastructure for training and testing world models on simulated environments.

Open source Benchmarks Evals

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> PaddlePaddle /</span> PaddleOCR

PaddleOCR is a lightweight, multilingual OCR toolkit (100+ languages) designed to convert PDF and image documents into structured data for LLM consumption.

Open source Vision Tools

SIG

HYP

GitHub Trending·May 29

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> opendatalab /</span> MinerU

MinerU converts complex documents (PDFs, Office files) into LLM-ready markdown/JSON for agentic workflows. Open-source tool for document extraction and data structuring.

AI Agents RAG Open source

SIG

HYP

Le Big Data·May 29

H1 obtient 40 million $ de CVS malgré le recul des investissements SaaS

H1 raises $40 million from CVS despite SaaS investment slowdown. The funding comes amid broader contraction in enterprise software investment.

Business Funding

SIG

HYP

Reddit r/LocalLLaMA·May 29

Shoutout to Gemma4 as a conversational assistant / agent

Gemma 2 26B A4B impresses on M5 MacBook: high speed, versatility (creative writing, debugging, vision), conversational personality. Versus Qwen 3.6 35B, Gemma excels outside coding despite slight weakness in programming tasks.

Gemini Qwen Open source

SIG

HYP

ActuIA·May 29

Pourquoi Nvidia mise sur Decart, une start-up IA capable d’optimiser aussi les puces concurrentes

Nvidia invests $300M in Decart, a startup focused on world models and software optimization. Nvidia's participation aims to control an optimization layer capable of running on its chips and those of competitors.

Infrastructure Business Funding

SIG

HYP

Hacker News (AI)·May 29

DeepSeek Slashes AI Costs to Cents

DeepSeek drastically reduces AI inference costs to cents. The Chinese company optimizes its models to lower computational resource consumption and usage fees.

DeepSeek Business

SIG

HYP

ActuIA·May 29

Cigref : 140 Md€ de surcoûts cloud en Europe, l'IA en bundle en deuxième cause

Cigref estimates €140 billion in annual cloud and software overruns across European organizations. AI bundled into solutions ranks as the second identified cause. Half of CIOs cannot measure ROI from these integrated AI offerings.

Business Regulation

SIG

HYP

Hacker News (AI)·May 29

Flathub prohibits AI-generated code

Flathub, the Linux application distribution platform, prohibits AI-generated code in its repositories. The decision aims to maintain quality and accountability standards for the project.

Regulation Open source

SIG

HYP

Vercel AI Blog·May 29

Function invocations now billed per unit

Vercel shifts to per-unit billing for function invocations. New rate: $0.0000006 per invocation (previously $0.60 per million) for Pro customers. Change effective next billing cycle.

Infrastructure Business

SIG

HYP

The Decoder·May 29

Amazon kills internal AI leaderboard after employees gamed it with pointless tasks

Amazon shuts down internal AI leaderboard after employees inflated scores through meaningless tasks, driving up cloud costs.

Business

SIG

HYP

Hacker News (AI)·May 29

GPUs and RAM Are in Short Supply, but the Real Bottleneck for AI Is Electricians

The article argues that the real bottleneck for AI scaling is not GPU or RAM scarcity, but a shortage of qualified electricians. Energy infrastructure and physical server installation are becoming the limiting factor for large-scale data center deployment.

Infrastructure Business

SIG

HYP

Le Big Data·May 29

Corgi lève 106 millions $ et atteint 2,6 milliards de valorisation

Corgi raises $106 million just three weeks after its first Series B, reaching a $2.6 billion valuation.

Funding Business

SIG

HYP

Le Big Data·May 29

DeepSeek V4 : émancipation chinoise et urgence d’une stratégie IA européenne

DeepSeek V4 represents a major breakthrough in Chinese AI and challenges the effectiveness of Western strategies. The article highlights Europe's urgent need to develop a competitive AI strategy in response to this technological independence.

DeepSeek Regulation

SIG

HYP

Reddit r/MachineLearning·May 29

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

Optimized monokernel for LLM inference on AMD MI300X: 3,300 output tokens/s per request (batch 1, no speculative decoding). Architecture mapped to GPU physical topology. Initial support for 2B model, frontier MoE planned.

Infrastructure Code generation Benchmarks

SIG

HYP

ActuIA·May 29

Outils RH et intelligence artificielle : l’Europe repousse les obligations haut risque à décembre 2027

The EU postpones to December 2027 the enforcement of obligations for high-risk AI systems in HR tools. A provisional political agreement on May 7, 2026 regarding the Digital Omnibus AI amends the timeline of regulation 2024/1689.

Regulation AI safety

SIG

HYP

ActuIA·May 29

PCAIDE 2026 : la conférence parisienne sur l’éthique de l’IA revient les 11 et 12 juin à Mines Paris

The fourth edition of PCAIDE (Paris Conference on AI & Digital Ethics) will take place on June 11-12, 2026 at Mines Paris. The conference returns following the 2025 edition.

Regulation AI safety

SIG

HYP

ActuIA·May 29

EDF, BMW, Airbus : Mistral AI met en scène son virage industriel, mais les contrats chiffrés restent rares

Mistral AI showcases its industrial pivot at AI Now Summit (May 28, 2026) with announced partnerships with EDF, BMW, and Airbus. However, specific contract values remain undisclosed.

Mistral Business

SIG

HYP

Reddit r/LocalLLaMA·May 29

llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp

llama.cpp PR #23764: use f16 masks in Flash Attention to reduce VRAM consumption. Optimization enabling larger models to fit on GPU memory.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 29

How do I make MTP work in llama-server?

User tests MTP (Multi-Token Prediction) on Qwen3.6-35B with llama.cpp on RTX 3090. With MTP enabled (--spec-type draft-mtp), performance drops: prefill from 1082 t/s to 878 t/s (N=1), generation from 116 t/s to 108 t/s. Draft acceptance rates low (0.80 to 0.37). Seeks optimization advice.

Llama Code generation Benchmarks

SIG

HYP

Le Big Data·May 29

Anthropic dépasse 965 milliards de dollars grâce à sa Série H

Anthropic raises $65 billion in Series H funding, reaching a $965 billion valuation. One of the largest funding rounds in the AI sector.

Anthropic Funding Business

SIG

HYP

Hacker News (AI)·May 29

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

A developer embedded a malicious prompt injection into code shared with 'vibe coders' to trigger data deletion. The incident highlights security risks from prompt injections in development workflows.

Prompt engineering AI safety Code generation

SIG

HYP

Le Big Data·May 29

Liquid AI lance LFM2.5-8B-A1B : la taille ne fait-elle plus vraiment la performance ?

Liquid AI releases LFM2.5-8B-A1B, an 8B model designed to show that performance does not solely depend on model size. The launch challenges the paradigm of ever-larger models.

Open source Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 29

Use HTML as the primary chat language for your agents so they can draw diagrams

A developer experiments with HTML as the primary chat language for coding agents instead of markdown. By switching the system prompt to HTML, the agent (Qwen 3.6-27B) now generates SVG diagrams directly in responses. Results are promising but the model still tends to default to markdown.

Prompt engineering AI Agents Qwen

SIG

HYP

Reddit r/LocalLLaMA·May 29

New LFM2.5 8b A1b model!!

New LFM2.5 8B A1b model announced with performance on par with Nemotron 3 Nano at higher speed. Support being added to SmallCode despite non-standard tool calls.

Open source Code generation

SIG

HYP

Reddit r/MachineLearning·May 29

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Research on probe-targeted fine-tuning (LoRA) for verbal confidence calibration in LLMs. Models internally detect correct answers (0.76–0.88 AUROC) but output 99% confidence uniformly. Fine-tuning across 8 models (7B–70B) with causal activation patching (ρ=0.976). Code and pre-registration available.

Fine-tuning Reasoning Alignment

SIG

HYP

Hacker News (AI)·May 29

Python utility package for building Claude Code hooks

Python utility package for building Claude Code hooks. Enables custom integration with Claude Code through modular extensions.

Claude Code Tools Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

Liquid AI releases LFM2.5-8B-A1B

Liquid AI releases LFM2.5-8B-A1B, 8B model with 128K context window, 38T pre-training tokens, and large-scale RL. Doubled vocabulary for non-Latin languages. Supports tool chaining and complex tasks on entry-level laptops.

Open source Code generation AI Agents

SIG

HYP

Reddit r/LocalLLaMA·May 29

Step 3.7 Flash Config + Early Data on 2x RTX 6000's

Step 3.7 Flash configured and benchmarked on dual RTX Pro 6000 Blackwell GPUs. Early token-per-second inference metrics recorded. Extended testing underway, full results pending.

Benchmarks Infrastructure Open source

SIG

HYP

Reddit r/LocalLLaMA·May 29

StepFun 3.7 Flash - Speed Benchmark in M5 Max

StepFun 3.7 Flash benchmark on M5 Max (128 GB) with llama.cpp. Short contexts (<16k tokens) fast and responsive. 32k-64k contexts usable. Detailed metrics: 65k tokens reaches 360.79 t/s token generation.

Open source Benchmarks Infrastructure

SIG

HYP

arXiv cs.AI·May 29

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

VFEAgent is a multimodal multi-agent system automating Finite Element Analysis (FEA) from images and text descriptions. The framework combines a vision-language pipeline with ReAct reasoning and verification-first code synthesis to generate physically valid simulations, outperforming existing LLM-based approaches.

AI Agents Multi-agent Vision

SIG

HYP

arXiv cs.CL·May 29

From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization

Paper introduces Program-of-Thoughts prompting for chart summarization: VLMs generate Python programs to derive valid summary statistics instead of direct text. Proposes chart-to-dictionary auxiliary task. Results match existing methods on semantic and factual metrics.

Prompt engineering Vision Reasoning

SIG

HYP

arXiv cs.CL·May 29

Hallucination Detection-Guided Preference Optimization for Clinical Summarization

Preference optimization method guided by hallucination detection to improve clinical summarization reliability. On Llama-3.1-8B-Instruct, reduces hallucinations by 24% at inference and 48% after fine-tuning, preserving fluency. Evaluated on MIMIC-IV.

Llama Fine-tuning AI safety

SIG

HYP

arXiv cs.CL·May 29

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

GPF-LiveNews is a streaming evaluation protocol to audit how LLMs frame emerging news events for different audiences. Tested on 23 models across 12 monitoring runs, it measures semantic and sentiment variations across 42 identity labels. Results show Policy/Action prompts produce strongest semantic movement, while sentiment variation remains flat across dimensions.

Evals AI safety Alignment

SIG

HYP

arXiv cs.CL·May 29

The Trust Paradox: How CS Researchers Engage LLM Leaderboards

Qualitative study of 8 AI researchers reveals a paradox: they distrust LLM leaderboards yet use them as decision aids. Peer networks dominate model selection. NLP researchers face SOTA pressure absent in HCI/Systems. Universal demand: cost transparency.

Benchmarks Evals

SIG

HYP

arXiv cs.CL·May 29

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

GenesisFunc is an automated multi-agent pipeline for generating function-calling training data. Starting from reliable tools in public benchmarks, the system produces diverse conversations with multi-stage quality control. An 8B model fine-tuned on this synthetic data outperforms similarly-sized open-source models in in-domain performance and out-of-domain generalization.

Multi-agent Code generation Fine-tuning

SIG

HYP

arXiv cs.CL·May 29

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

Comparative assessment of four Dutch syllabification algorithms (Brandt Corstius, Liang, Trogkanis-Elkan CRF, and a novel deep learning model). The deep learning model combining phonetic and orthographic information achieves 99.65% word accuracy (+0.14% improvement over literature). Data-driven algorithms outperform knowledge-based approaches.

Papers Benchmarks Code generation

SIG

HYP

arXiv cs.CL·May 29

How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

Empirical study of behavioral reproducibility in LLM agents with tool-calling capabilities. Researchers measure whether agents select the same tools, in the same order, with identical parameters, across repeated identical invocations. Focus on structured tool-calling interfaces with typed parameters and consequential side effects.

AI Agents Benchmarks AI safety

SIG

HYP

arXiv cs.CL·May 29

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Thoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.

Reasoning Reinforcement learning Prompt engineering

SIG

HYP

arXiv cs.CL·May 29

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

Researchers introduce a Behavioral Specification as an interpretive layer to align AI decisions with user preferences. Tested on 14 autobiographical corpora, it improves representational accuracy at ~25x lower context cost than raw corpus while reducing model hedging. Effective on interpretation-required questions; less helpful on recall-based tasks.

Alignment RAG AI Agents

SIG

HYP