May 2026

3149 articles

Agents IA : pourquoi Singapour attire OpenAI et Google ?

Singapore invests over 300 million Singapore dollars in AI agents, attracting OpenAI and Google. The city-state strengthens its position in this strategic sector.

AI Agents OpenAI DeepMind

SIG

HYP

Reddit r/LocalLLaMA·May 20

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

User runs Qwen3.6-27B via llama.cpp on two Blackwell 6000 MaxQ GPUs with AMD Epyc, achieving 100-110 t/s. Seeks optimizations: cards at 250/300W, ~20GB VRAM free. Setup includes flash-attention, speculative decoding (draft-MTP), batch 6144, 1M context.

Llama Open source Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 20

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

User reports GSP (Graphics System Processor) timeouts on RTX Pro 5000 Blackwell in VFIO passthrough on Linux KVM/QEMU. GPU enters unrecoverable bad state after initialization timeout. Secondary Bus Reset, D3cold, and driver reload fail; only full host reboot works. Seeking recovery solutions without reboot.

Infrastructure Open source

SIG

HYP

Hacker News (AI)·May 20

Railway GCP Account Suspension Incident Report

Railway reports unexpected GCP account suspension. Incident impacts customer deployments. Investigation underway on root causes and mitigation steps.

Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 20

The MTP function in LMStudio causes a decrease in output quality.

A LMStudio user reports output quality degradation when enabling the MTP function, producing garbage results compared to tests without MTP. The issue does not occur with locally compiled llama-server.exe.

Tools Open source

SIG

HYP

Le Big Data·May 20

Nectar Social lève 30 millions de dollars pour automatiser le marketing avec l’IA

Nectar Social, founded by two former Meta executives, raises $30 million to automate marketing with AI. The startup develops AI-powered marketing automation tools.

Business Funding

SIG

HYP

Le Big Data·May 20

Google I/O 2026 : ces lunettes XR créent de la musique d’un geste de la main

Google and Xreal unveil at Google I/O 2026 the « Project Aura », XR glasses capable of generating music through hand gesture recognition. The project combines extended reality and audio generation.

DeepMind Vision

SIG

HYP

Reddit r/LocalLLaMA·May 20

Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room

Qwen 3.7 Max ranked 5th by Artificial Analysis, on par with GPT 5.4 (xhigh) and ahead of Gemini 3.5 Flash. Qwen 3.6 27B scores 6 points below its Max variant. Qwen 3.7 27B/35B versions awaited.

Qwen Benchmarks

SIG

HYP

Le Big Data·May 20

Bons plans, immo, week-ends : les agents IA de Google vont surveiller le web pour vous

Google is developing autonomous AI agents capable of monitoring the web to find deals on real estate and weekend trips on behalf of users. These agents perform automated searches without human intervention.

AI Agents DeepMind

SIG

HYP

Vercel AI Blog·May 20

Grok Build 0.1 now available on Vercel AI Gateway

Grok Build 0.1, a beta coding model trained for agentic coding, is now available on Vercel AI Gateway. The model runs with non-configurable reasoning effort and no non-reasoning mode. Vercel AI Gateway provides a unified API for model calls, usage and cost tracking, with intelligent provider routing and automatic retries.

Code generation AI Agents Reasoning

SIG

HYP

Reddit r/LocalLLaMA·May 20

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]

Guardrails improve an 8B model from 53% to 99% on agentic tasks, according to an ACM CAIS '26 preprint. The technique enhances control and reliability of AI agents.

AI Agents AI safety Benchmarks

SIG

HYP

Le Big Data·May 20

Google I/O 2026 : Les rumeurs disaient vrai, Gemini 3.5 débarque et va tout balayer

Google unveils Gemini 3.5 series at Google I/O 2026, confirming prior rumors. The complete lineup is announced.

Gemini

SIG

HYP

Reddit r/LocalLLaMA·May 20

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Forge, a guardrails framework, improves an 8B model from 53% to 99% on agentic tasks. The tool adds control mechanisms to reliabilize AI agent execution.

AI Agents Tools Open source

SIG

HYP

arXiv cs.AI·May 20

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

Position paper advocating for 'data probes'—synthetic sequences from random processes—to systematically understand how data characteristics affect LLM performance across training, tuning, alignment, and in-context learning. Uses theoretical concepts like typical sets to move beyond compute-intensive empirical heuristics.

Papers Evals Fine-tuning

SIG

HYP

arXiv cs.AI·May 20

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

Microservice architecture for Document AI pipelines in production: classification, OCR, and structured field extraction via LLM. Processes thousands of multi-page documents per hour. Key findings: OCR dominates end-to-end latency (not LLM parsing), system saturation determined by shared GPU capacity. Concrete architectural patterns for production deployment.

Infrastructure Code generation RAG

SIG

HYP

arXiv cs.AI·May 20

Evaluating the Utility of Personal Health Records in Personalized Health AI

Study evaluating Gemini 3.0 Flash on 2,257 patient queries with Personal Health Records (PHR) context. Significant improvement in answer helpfulness with PHR data (p<0.001). Identified gaps: temporal disorientation, rare confabulations. Evaluation framework developed to monitor LLM answer quality based on PHR context.

Gemini RAG Evals

SIG

HYP

arXiv cs.AI·May 20

Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance

Analysis of power imbalances in stake-weighted governance of Proof-of-Stake blockchains using the Penrose-Banzhaf power index. Shows how few large-stake users can control decision-making despite not owning majority stakes. Provides theoretical and empirical findings on Project Catalyst data.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 20

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Comparative study of Kolmogorov-Arnold Networks (KANs) vs MLPs for IMU-based Human Activity Recognition (HAR). KANs excel on clean data but fail on noisy real-world datasets. Proposed hybrid KAN-MLP architecture achieves +5.33% macro F1-score improvement across 8 public datasets, outperforming pure baselines.

Benchmarks Papers

SIG

HYP

arXiv cs.AI·May 20

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Vision paper on autonomous agent networks (A2A). Authors argue that trustworthiness in multi-agent systems cannot be retrofitted but must be architected from the start. They identify systemic vulnerabilities (adversarial composition, semantic misalignment, cascading failures) and propose a conceptual framework based on four design pillars.

AI Agents Multi-agent Alignment

SIG

HYP

arXiv cs.AI·May 20

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

ReElicit is a Bayesian optimization framework for tuning system prompts using only aggregate feedback. An LLM dynamically elicits a compact, interpretable feature space, then a Gaussian process selects optimized target vectors refined into deployable prompts. Across 10 tasks with 30-evaluation budget, ReElicit outperforms aggregate-only prompt optimization baselines.

Prompt engineering Reasoning

SIG

HYP

arXiv cs.AI·May 20

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

DecisionBench is a benchmark for evaluating emergent delegation in long-horizon multi-agent workflows. The substrate includes 11 models (7 vendor families), GAIA/tau-bench/BFCL tasks, and multi-axis metrics (quality, cost, latency, routing fidelity). Results show quality alone masks orchestration signals, and delivery channel dominates description content.

AI Agents Multi-agent Benchmarks

SIG

HYP

arXiv cs.AI·May 20

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

MOCHA is a multi-objective optimization algorithm for refining LLM agent skills. It uses Chebyshev scalarization and exponential annealing to explore the complete Pareto front, including non-convex regions. On 6 tasks, MOCHA improves performance by 7.5% on average (up to 14.9% on FEVER) while discovering twice as many Pareto-optimal skill variants as baselines.

AI Agents Prompt engineering Reinforcement learning

SIG

HYP

arXiv cs.LG·May 20

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

HELLoRA attaches LoRA modules only to the most frequently activated experts per layer in Mixture-of-Experts models, reducing trainable parameters by 84% on OlMoE and improving accuracy by 9.2%. Tested on OlMoE-1B-7B, Mixtral-8x7B, and DeepSeekMoE across mathematical reasoning, code generation, and safety alignment.

Fine-tuning Benchmarks

SIG

HYP

May 2026

Agents IA : pourquoi Singapour attire OpenAI et Google ?

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

Railway GCP Account Suspension Incident Report

The MTP function in LMStudio causes a decrease in output quality.

Nectar Social lève 30 millions de dollars pour automatiser le marketing avec l’IA

Google I/O 2026 : ces lunettes XR créent de la musique d’un geste de la main

Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room

Bons plans, immo, week-ends : les agents IA de Google vont surveiller le web pour vous

Grok Build 0.1 now available on Vercel AI Gateway

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]

Google I/O 2026 : Les rumeurs disaient vrai, Gemini 3.5 débarque et va tout balayer

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

Evaluating the Utility of Personal Health Records in Personalized Health AI

Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

Simply Stabilizing the Loop via Fully Looped Transformer

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

How Far Are We From True Auto-Research?

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

Generative Recursive Reasoning

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

Generative Auto-Bidding with Unified Modeling and Exploration

Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection

Language models struggle with compartmentalization

OpenCompass: A Universal Evaluation Platform for Large Language Models

UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing

DECOR: Auditing LLM Deception via Information Manipulation Theory

AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

Theory-optimal Quantization Based on Flatness

Accurate Evaluation of Quickest Changepoint Detectors via Non-parametric Survival Analysis

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

Block-Based Double Decoders

m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder

Composition of Memory Experts for Diffusion World Models

Base Models Look Human To AI Detectors

PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

Drifting Objectives for Refining Discrete Diffusion Language Models

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Multi-Token Residual Prediction

Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening