June 2026

2731 articles

Oups… Amazon a dévoilé le Pixel Drop de Google avant l’heure

Amazon accidentally revealed Google's Pixel Drop ahead of its official announcement. Three new AI features for Pixel smartphones were exposed prematurely.

Gemini

SIG

HYP

Vercel AI Blog·Jun 15

Vercel Functions can now run up to 30 minutes

Vercel Functions now support execution durations up to 30 minutes (vs 800 seconds) for Node.js and Python on Pro/Enterprise plans. Fluid Compute bills active CPU only, suited for LLM calls, database queries, and document processing.

Infrastructure AI Agents Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

archex converts a repo into ranked, token-budgeted context for AI agents: symbols, imports, dependency graph. Local-first pipeline (BM25F + embeddings + RRF + reranker) with no API key, no telemetry. Benchmarks: recall 0.95 vs 0.32 (cocoindex-code), cold start 0ms vs 4,721ms, 71% fewer tokens.

Code generation RAG AI Agents

SIG

HYP

Le Big Data·Jun 15

Vous utilisez Claude ? Anthropic pourrait bientôt vous demander une preuve d’identité

Anthropic may soon require identity verification to access certain Claude features. The measure likely aims to strengthen security or comply with regulations.

Claude Anthropic AI safety

SIG

HYP

Hacker News (AI)·Jun 15

India, UAE partner on AI sovereignty to bypass Google, Microsoft

India and UAE partner to develop sovereign AI infrastructure, reducing reliance on Google and Microsoft. The partnership aims to build local AI and data capabilities.

Regulation Business

SIG

HYP

Hacker News (AI)·Jun 15

Show HN: Can Europe train a frontier AI model on the compute it owns?

A project investigates whether Europe can train a frontier AI model using only its own compute resources. Open question about European technological autonomy versus US AI giants.

Open source Infrastructure Regulation

SIG

HYP

The Decoder·Jun 15

Pokémon Go data helped train AI now linked to military drones

AR scan data from Pokémon Go players trained Niantic's spatial AI models. This technology is now combined with a US defense contractor's software for GPS-free navigation.

Vision AI Agents Infrastructure

SIG

HYP

Reddit r/MachineLearning·Jun 15

I implemented 10 core ML algorithms from scratch with NumPy. Here's what no tutorial taught me [P]

Implementation of 10 classical ML algorithms (regression, KNN, decision trees, XGBoost, neural networks) in pure NumPy, validated against Scikit-learn and PyTorch. Open-source repo with Jupyter notebooks runnable locally or on Colab. Author emphasizes modular structure importance and gradient descent understanding.

Open source Tools Fine-tuning

SIG

HYP

Le Big Data·Jun 15

DXC et Anthropic apportent l’IA aux systèmes critiques d’entreprise

DXC and Anthropic announce a global partnership to integrate generative AI into critical systems of large enterprises.

Anthropic Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

React Native ExecuTorch now runs Gemma 4 (Vulkan and MLX accelerated)

ExecuTorch integrates Gemma 4 into React Native with GPU acceleration: Vulkan on Android, MLX on Apple Silicon. Fully offline execution.

Gemini Code generation Tools

SIG

HYP

Le Big Data·Jun 15

Pemba, le premier robot humanoïde qui veut gravir le mont Everest

Pemba, a humanoid robot, trains to climb Mount Everest after successfully ascending Chimborazo in snowy conditions. The project tests autonomous locomotion and navigation capabilities in extreme environments.

Robotics

SIG

HYP

Le Big Data·Jun 15

OpenAI acquiert Ona pour renforcer les agents IA de Codex

OpenAI acquires Ona, a specialist in secure cloud environments, to strengthen its AI agents and Codex platform. The acquisition is part of OpenAI's strategy to develop autonomous agent capabilities.

OpenAI AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

I got tired of juggling OpenRouter + Artificial Analysis + Design Arena tabs to pick a model, so I put them in one filterable table

modelgrep.com aggregates ~300 models from OpenRouter with unified filters: Artificial Analysis benchmarks, Design Arena Elo, live throughput, price, context length, vision/tools/reasoning support. Free API, no signup required. Open-source repo available.

Tools Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·Jun 15

PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

PrintGuard 2.0: FDM 3D printing failure detector using ShuffleNetV2 + few-shot prototypical network. ~5 MB TFLite model via LiteRT, runs unmodified on CPython and browser (Pyodide). Unified architecture with single Platform implementation per runtime.

Open source

SIG

HYP

GitHub Trending·Jun 15

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> trycua /</span> cua

Open-source infrastructure for computer-use agents. Provides sandboxes, SDKs, and benchmarks to train and evaluate AI agents capable of controlling full desktops (macOS, Linux, Windows).

AI Agents Open source Benchmarks

SIG

HYP

GitHub Trending·Jun 15

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mikeroyal /</span> Self-Hosting-Guide

Comprehensive self-hosting guide covering on-premises software deployment, private cloud, LLMs, WireGuard, automation, Home Assistant, and networking infrastructure.

Open source Infrastructure Tools

SIG

HYP

GitHub Trending·Jun 15

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> amruthpillai /</span> reactive-resume

Reactive Resume is an open-source, free resume builder prioritizing privacy and security. The tool offers customization, portability, and data ownership for users.

Open source Tools

SIG

HYP

GitHub Trending·Jun 15

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> TencentCloud /</span> TencentDB-Agent-Memory

TencentDB Agent Memory delivers fully local long-term memory for AI Agents via a 4-tier progressive pipeline, with zero external API dependencies.

AI Agents Infrastructure

SIG

HYP

GitHub Trending·Jun 15

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> smol-ai /</span> GodMode

GodMode is an AI chat browser providing fast, unified web access to ChatGPT, Claude, Bard, Bing, and Llama2. Productivity tool used multiple times daily.

Claude GPT Tools

SIG

HYP

Le Big Data·Jun 15

Ce fou furieux tente de recréer GTA 6 de A à Z… uniquement avec une IA

A developer attempts to recreate GTA 6 entirely using AI, in parallel with the official release scheduled for November. The project leverages AI models to generate code, graphics assets, and game design.

Code generation Image generation Tools

SIG

HYP

The Decoder·Jun 15

Anthropic shutdown sparks sovereignty debate across Europe

The European Commission assesses implications of a US order forcing Anthropic to shut down Fable 5 and Mythos 5 globally. European researchers debate building homegrown foundation models versus securing contractual access. Building local infrastructure requires computing capacity, energy, and competitive providers Europe currently lacks.

Anthropic Regulation Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

I'm still surprised on how good the kv quantization has become

A r/LocalLLaMA user reports that KV (key-value) quantization has reached impressive quality: even with KV at q4_0 (including the drafter), the model accurately retrieves information within a 100k token context.

Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

Lower generation speed with H100 and H200 than with RTX 5090?

User reports slower generation on H100 (42 tok/sec) than RTX 5090 (57 tok/sec) using llama.cpp with 31B Q6 model. H100 provides larger context (128k vs 26k) and higher bandwidth, yet generates slower.

Infrastructure Benchmarks

SIG

HYP

The Decoder·Jun 15

Microsoft CEO Satya Nadella warns of "a small number of AI systems capturing all the economic returns"

Satya Nadella (Microsoft) warns that a small number of AI systems could capture all economic returns. He advocates companies build "token capital"—their own AI capabilities on internal data and proprietary learning loops—to avoid this concentration.

Business Alignment

SIG

HYP

Le Big Data·Jun 15

Le FBI s’est construit sa propre petite ville… juste pour se faire hacker

The FBI built Kinetic Cyber Range, a training facility designed as a simulated city for cyberattack drills and agent preparation against cyber threats.

AI safety

SIG

HYP

Le Big Data·Jun 15

Mistral serait valorisée 20 milliards d’euros après une levée de 3 milliards

Mistral in talks to raise 3 billion euros, targeting a valuation of 20 billion euros.

Mistral Funding Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b

Qwen 27B achieves doubled generation speed and reduced VRAM usage (21 GB → 17.5 GB) on identical hardware while maintaining full context accuracy.

Qwen Open source Infrastructure

SIG

HYP

Le Big Data·Jun 15

OpenAI Partner Network : un réseau pour industrialiser l’IA

OpenAI launches the OpenAI Partner Network, a network designed to accelerate AI deployment in enterprises, with a 150 million dollar investment.

OpenAI Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

Personal hybrid agent tool: frontier model planning (Codex) with local execution using Qwen 3.6 27B on dual RTX 3090. 3-tier architecture (Planner/Local/Senior optional) to minimize frontier costs while retaining reasoning capabilities. Deterministic task validation.

AI Agents Qwen Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

moar QAT stuff and hairy ticks

Release of quantized Gemma-4 models (12B and 31B) using an improved QAT method based on Q4_0. Author developed an iterative max-error search process in F16 outperforming imatrix, achieving KLD comparable to unsloth. PyTorch code available without restrictions.

Open source Benchmarks Code generation

SIG

HYP

ActuIA·Jun 15

Les États-Unis coupent l'accès aux modèles Fable 5 et Mythos 5 d'Anthropic : un précédent pour la souveraineté IA

The US has required Anthropic to restrict access to its most advanced models Fable 5 and Mythos 5 to foreign nationals. Anthropic disabled these models for all non-US users, setting a precedent for sovereign control of advanced AI systems.

Anthropic Regulation Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

UI/svg block rendering by ServeurpersoCom · Pull Request #24080 · ggml-org/llama.cpp

Pull request #24080 on llama.cpp adds UI/SVG block rendering. Video demonstration shows SVG rendering capabilities integrated into the project.

Llama Open source Tools

SIG

HYP

Hacker News (AI)·Jun 15

Show HN: AwsmAudio – a WebAudio editor with native MCP

AwsmAudio is a WebAudio editor with native MCP protocol integration. Project showcased on Hacker News with minimal engagement (3 points, 0 comments).

MCP Tools Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

I made a private on-device LLM app for Android (notes + recall, nothing leaves the phone)

Developer releases Android app running LLM fully on-device for note-taking and AI-powered recall. All data stays on phone, no cloud. Seeking beta testers (8GB+ RAM recommended), free, in Google Play closed testing.

Open source Tools RAG

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

I ported EXL3 to run well on Apple Silicon - PonyExl3

EXL3 codec ported to Apple Silicon using Metal backend. M5 Max achieves ~600 tok/s prefill and ~38 tok/s generation (Qwen 27B), outperforming RTX 4090 on some benchmarks (68.5-80 tok/s decode). GitHub repo with reproducible results.

Open source Code generation Infrastructure

SIG

HYP

arXiv cs.AI·Jun 15

Hyperdimensional computing for structured querying on tabular data embeddings

Hyperdimensional Computing (HDC) and Holographic Reduced Representations applied to tabular row embeddings. Derives interpretable similarity thresholds for structured queries (equality/inequality predicates), evaluated on two real-world datasets against EmbDI baseline. HDC reliably identifies zero-match predicates.

Embeddings Vector search Papers

SIG

HYP

arXiv cs.AI·Jun 15

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

A case study on semi-autonomous formalization of Grothendieck's vanishing theorem shows LLMs close proof gaps but produce non-reusable formalizations. After expert review, agents adapt well to local feedback but fail at designing sound definitions and APIs.

Reasoning Code generation Evals

SIG

HYP

arXiv cs.AI·Jun 15

A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale

Multi-agent AI system for automated high school transcript processing. Four-agent architecture (pattern recognition, semantic analysis, vision intelligence, orchestration) achieves 96.7% accuracy on 40 real transcripts from 13 U.S. states, 45 seconds per document.

Multi-agent AI Agents Vision

SIG

HYP

arXiv cs.AI·Jun 15

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

MA-ProofBench is the first formal theorem-proving benchmark dedicated to Mathematical Analysis with 200 formalized theorems across two difficulty levels (undergraduate and Ph.D.). GPT-5.5 achieves only 16% Pass@8 on Level I and 5% on Level II, exposing major gaps in LLMs' advanced formal reasoning capabilities.

Benchmarks Reasoning GPT

SIG

HYP

arXiv cs.AI·Jun 15

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

VeriGeo generates controllable geometry problems via executable reasoning traces. An Author agent creates the problem and diagram per user constraints, a Solver agent produces the proof. A three-stage pipeline verifies numerical, analytical, and global consistency. Fine-tuning on 8.7k examples achieves best reported GeoQA performance and strong results on PGPS9K and MathVista-GPS.

Reasoning Vision Benchmarks

SIG

HYP

arXiv cs.AI·Jun 15

TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards

TwinBI is an agentic digital-twin framework coupling an LLM-based agent system with executable BI dashboard state. It unifies conversational interaction, dashboard manipulation, and provenance tracking through a shared interaction log. Benchmark: exact-match accuracy 43.3% → 63.3%, timeout rate 40% → 10%.

AI Agents RAG Benchmarks

SIG

HYP

arXiv cs.LG·Jun 15

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

GTBP (Graph-based Target Back-Propagation) is a context adaptation framework for multi-LLM agentic systems. It back-propagates local targets through a directed acyclic graph workflow and updates prompts stage-wise. Theoretically convergent, outperforms baselines across 3 benchmarks.

AI Agents Multi-agent Prompt engineering

SIG

HYP

June 2026

Oups… Amazon a dévoilé le Pixel Drop de Google avant l’heure

Vercel Functions can now run up to 30 minutes

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

Vous utilisez Claude ? Anthropic pourrait bientôt vous demander une preuve d’identité

India, UAE partner on AI sovereignty to bypass Google, Microsoft

Show HN: Can Europe train a frontier AI model on the compute it owns?

Pokémon Go data helped train AI now linked to military drones

I implemented 10 core ML algorithms from scratch with NumPy. Here's what no tutorial taught me [P]

DXC et Anthropic apportent l’IA aux systèmes critiques d’entreprise

React Native ExecuTorch now runs Gemma 4 (Vulkan and MLX accelerated)

Pemba, le premier robot humanoïde qui veut gravir le mont Everest

OpenAI acquiert Ona pour renforcer les agents IA de Codex

I got tired of juggling OpenRouter + Artificial Analysis + Design Arena tabs to pick a model, so I put them in one filterable table

PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

Ce fou furieux tente de recréer GTA 6 de A à Z… uniquement avec une IA

Anthropic shutdown sparks sovereignty debate across Europe

I'm still surprised on how good the kv quantization has become

*Lower* generation speed with H100 and H200 than with RTX 5090?

Microsoft CEO Satya Nadella warns of "a small number of AI systems capturing all the economic returns"

Le FBI s’est construit sa propre petite ville… juste pour se faire hacker

Mistral serait valorisée 20 milliards d’euros après une levée de 3 milliards

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b

OpenAI Partner Network : un réseau pour industrialiser l’IA

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

moar QAT stuff and hairy ticks

Les États-Unis coupent l'accès aux modèles Fable 5 et Mythos 5 d'Anthropic : un précédent pour la souveraineté IA

UI/svg block rendering by ServeurpersoCom · Pull Request #24080 · ggml-org/llama.cpp

Show HN: AwsmAudio – a WebAudio editor with native MCP

I made a private on-device LLM app for Android (notes + recall, nothing leaves the phone)

I ported EXL3 to run well on Apple Silicon - PonyExl3

Hyperdimensional computing for structured querying on tabular data embeddings

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

The Coin Flip Judge? Reliability and Bias in LLM-as-a-Judge Evaluation

Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

LoSoNA: A Benchmark for Local Social Norm Adaptation in Group Conversations

AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

Which Models Perform Better in Inheritance Reasoning?

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

MoDiCoL: A Modular Diagnostic Continual Learning Dataset for Robust Speech Recognition

Learning High Coverage Discriminative Parsimonious Rulesets

Efficient On-Device Diffusion LLM Inference with Mobile NPU

A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

Does the Judge Prefer English? Evaluating Language-Switching Invariance in LLM-as-a-Judge

OdysSim: Building Foundation Models for Human Behavior Simulation

Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

Fodor and Pylyshyn's Systematicity Challenge Still Stands

Benchmarking Web Agent Safety under E-commerce Deceptive Interfaces

A fully GPU-based workflow for building physics emulators of hypersonic flows

QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

Harsher on Male? Evaluating LLMs on Gender-Asymmetric Moral Framing Across Diverse Conflict Scenarios

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

Implicit Reasoning for Large Language Model-based Generative Recommendation

The Holistic Storage of Verb+Up Phrases in Text-based and Audio-based Language Models

Non-Parametric Machine Text Detection via Multi-View Gaussian Processes

Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

Fusing Stylometric and Embedding Systems to Estimate Authorship Likelihood Ratios in Japanese

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling

When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding

The Culture Funnel: You Can't Align What isn't in the Data

MedLatentDx: Latent Multi-Agent Communication for Cross-Hospital Rare-Disease Diagnosis

Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

Lower generation speed with H100 and H200 than with RTX 5090?