juin 2026

2731 articles

Oups… Amazon a dévoilé le Pixel Drop de Google avant l’heure

Amazon a accidentellement révélé le Pixel Drop de Google avant son annonce officielle. Trois nouvelles fonctionnalités IA pour les smartphones Pixel ont été exposées prématurément.

Gemini

SIG

HYP

Vercel AI Blog·15 juin

Vercel Functions can now run up to 30 minutes

Vercel Functions supporte désormais des exécutions jusqu'à 30 minutes (vs 800 secondes) pour Node.js et Python sur les plans Pro/Enterprise. Fluid Compute facture uniquement le CPU actif, idéal pour les appels LLM, requêtes BD et traitement de documents.

Infrastructure Agents IA Raisonnement

SIG

HYP

Reddit r/LocalLLaMA·15 juin

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

archex transforme un dépôt en contexte rangé et budgété pour agents IA : symboles, imports, graphe de dépendances. Pipeline local (BM25F + embeddings + RRF + reranker) sans API, sans télémétrie. Benchmarks : recall 0.95 vs 0.32 (cocoindex-code), démarrage froid 0ms vs 4,721ms, 71% moins de tokens.

Génération de code RAG Agents IA

SIG

HYP

Le Big Data·15 juin

Vous utilisez Claude ? Anthropic pourrait bientôt vous demander une preuve d’identité

Anthropic envisage d'exiger une vérification d'identité pour accéder à certaines fonctionnalités de Claude. La mesure viserait probablement à renforcer la sécurité ou à se conformer à des régulations.

Claude Anthropic Sécurité IA

SIG

HYP

Hacker News (AI)·15 juin

India, UAE partner on AI sovereignty to bypass Google, Microsoft

L'Inde et les Émirats arabes unis s'associent pour développer une infrastructure IA souveraine afin de réduire leur dépendance envers Google et Microsoft. Le partenariat vise à créer des capacités locales d'IA et de données.

Régulation Business

SIG

HYP

Hacker News (AI)·15 juin

Show HN: Can Europe train a frontier AI model on the compute it owns?

Un projet explore la capacité de l'Europe à entraîner un modèle IA frontier avec ses ressources de calcul propres. Question ouverte sur l'autonomie technologique européenne face aux géants américains.

Open source Infrastructure Régulation

SIG

HYP

The Decoder·15 juin

Pokémon Go data helped train AI now linked to military drones

Les données de scans AR collectées par les joueurs de Pokémon Go ont entraîné les modèles d'IA spatiale de Niantic. Cette technologie est désormais intégrée au logiciel d'un contractant de défense américain pour la navigation sans GPS.

Vision Agents IA Infrastructure

SIG

HYP

Reddit r/MachineLearning·15 juin

I implemented 10 core ML algorithms from scratch with NumPy. Here's what no tutorial taught me [P]

Implémentation de 10 algorithmes ML classiques (régression, KNN, arbres de décision, XGBoost, réseaux de neurones) en NumPy pur, validés contre Scikit-learn et PyTorch. Repo open-source avec notebooks Jupyter exécutables localement ou sur Colab. L'auteur souligne l'importance de la structure modulaire et de la compréhension du gradient descent.

Open source Outils Fine-tuning

SIG

HYP

Le Big Data·15 juin

DXC et Anthropic apportent l’IA aux systèmes critiques d’entreprise

DXC et Anthropic annoncent un partenariat mondial pour intégrer l'IA générative dans les systèmes critiques des grandes entreprises.

Anthropic Business

SIG

HYP

Reddit r/LocalLLaMA·15 juin

React Native ExecuTorch now runs Gemma 4 (Vulkan and MLX accelerated)

ExecuTorch intègre Gemma 4 dans React Native avec accélération GPU : Vulkan sur Android, MLX sur Apple Silicon. Exécution entièrement hors ligne.

Gemini Génération de code Outils

SIG

HYP

Le Big Data·15 juin

Pemba, le premier robot humanoïde qui veut gravir le mont Everest

Pemba, un robot humanoïde, s'entraîne pour gravir le mont Everest après avoir escaladé le Chimborazo en conditions neigeuses. Le projet teste les capacités de locomotion et de navigation autonome en environnement extrême.

Robotique

SIG

HYP

Le Big Data·15 juin

OpenAI acquiert Ona pour renforcer les agents IA de Codex

OpenAI acquiert Ona, spécialiste des environnements cloud sécurisés, pour renforcer ses agents IA et sa plateforme Codex. L'acquisition s'inscrit dans la stratégie d'OpenAI de développer des capacités d'agents autonomes.

OpenAI Agents IA Génération de code

SIG

HYP

Reddit r/LocalLLaMA·15 juin

I got tired of juggling OpenRouter + Artificial Analysis + Design Arena tabs to pick a model, so I put them in one filterable table

modelgrep.com agrège ~300 modèles d'OpenRouter avec filtres unifiés : benchmarks Artificial Analysis, Elo Design Arena, débit temps réel, prix, contexte, vision/outils/reasoning. API gratuite, pas de signup. Repo open-source disponible.

Outils Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·15 juin

PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

PrintGuard 2.0 : détecteur de défauts d'impression 3D FDM basé sur ShuffleNetV2 + réseau prototypique few-shot. Modèle TFLite (~5 MB) via LiteRT, exécutable inchangé en CPython et navigateur (Pyodide). Architecture unifiée avec une seule implémentation Platform par runtime.

Open source

SIG

HYP

GitHub Trending·15 juin

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> trycua /</span> cua

Infrastructure open-source pour agents d'utilisation informatique. Propose des bacs à sable, SDKs et benchmarks pour entraîner et évaluer des agents IA capables de contrôler des bureaux complets (macOS, Linux, Windows).

Agents IA Open source Benchmarks

SIG

HYP

GitHub Trending·15 juin

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mikeroyal /</span> Self-Hosting-Guide

Guide complet d'auto-hébergement couvrant l'installation locale de logiciels, cloud privé, LLMs, WireGuard, automatisation, Home Assistant et infrastructure réseau.

Open source Infrastructure Outils

SIG

HYP

GitHub Trending·15 juin

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> amruthpillai /</span> reactive-resume

Reactive Resume est un générateur de CV open-source, gratuit et sécurisé. L'outil privilégie la confidentialité, la personnalisation et la portabilité des données utilisateur.

Open source Outils

SIG

HYP

GitHub Trending·15 juin

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> TencentCloud /</span> TencentDB-Agent-Memory

TencentDB Agent Memory offre une mémoire à long terme entièrement locale pour les agents IA via un pipeline progressif à 4 niveaux, sans dépendances API externes.

Agents IA Infrastructure

SIG

HYP

GitHub Trending·15 juin

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> smol-ai /</span> GodMode

GodMode est un navigateur de chat IA offrant accès rapide à ChatGPT, Claude, Bard, Bing et Llama2 dans une seule interface web. Outil de productivité utilisé quotidiennement.

Claude GPT Outils

SIG

HYP

Le Big Data·15 juin

Ce fou furieux tente de recréer GTA 6 de A à Z… uniquement avec une IA

Un développeur tente de recréer GTA 6 entièrement avec l'IA, en parallèle de la sortie officielle prévue en novembre. Le projet utilise des modèles d'IA pour générer le code, les assets graphiques et le game design.

Génération de code Génération d'images Outils

SIG

HYP

The Decoder·15 juin

Anthropic shutdown sparks sovereignty debate across Europe

La Commission européenne évalue les implications d'un ordre américain forçant Anthropic à arrêter Fable 5 et Mythos 5 mondialement. Les chercheurs européens débattent entre construire leurs propres modèles fondamentaux ou sécuriser l'accès par contrats. Bâtir une infrastructure locale exigerait capacités de calcul, énergie et fournisseurs compétitifs que l'Europe ne possède pas.

Anthropic Régulation Business

SIG

HYP

Reddit r/LocalLLaMA·15 juin

I'm still surprised on how good the kv quantization has become

Un utilisateur de r/LocalLLaMA rapporte que la quantification KV (key-value) atteint une qualité remarquable : même avec KV en q4_0 (y compris le drafter), le modèle retrouve précisément les informations dans un contexte de 100k tokens.

Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·15 juin

Lower generation speed with H100 and H200 than with RTX 5090?

Utilisateur rapporte une génération plus lente sur H100 (42 tok/sec) qu'sur RTX 5090 (57 tok/sec) avec llama.cpp et un modèle 31B Q6. H100 offre plus de contexte (128k vs 26k) et plus de bande passante, mais génère plus lentement.

Infrastructure Benchmarks

SIG

HYP

The Decoder·15 juin

Microsoft CEO Satya Nadella warns of "a small number of AI systems capturing all the economic returns"

Satya Nadella (Microsoft) avertit que quelques systèmes IA pourraient capturer toute la valeur économique. Il préconise que les entreprises construisent du « token capital » — leurs propres capacités IA sur données internes et boucles d'apprentissage propriétaires — pour éviter cette concentration.

Business Alignement

SIG

HYP

Le Big Data·15 juin

Le FBI s’est construit sa propre petite ville… juste pour se faire hacker

Le FBI a créé Kinetic Cyber Range, une ville d'entraînement dédiée aux simulations de cyberattaques et à la préparation des agents aux menaces informatiques.

Sécurité IA

SIG

HYP

Le Big Data·15 juin

Mistral serait valorisée 20 milliards d’euros après une levée de 3 milliards

Mistral en discussions pour lever 3 milliards d'euros, visant une valorisation de 20 milliards d'euros.

Mistral Financements Business

SIG

HYP

Reddit r/LocalLLaMA·15 juin

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b

Qwen 27B affiche une vitesse de génération doublée et une consommation VRAM réduite (21 GB → 17,5 GB) sur le même matériel, sans perte de précision contextuelle.

Qwen Open source Infrastructure

SIG

HYP

Le Big Data·15 juin

OpenAI Partner Network : un réseau pour industrialiser l’IA

OpenAI lance l'OpenAI Partner Network, un réseau destiné à accélérer le déploiement de l'IA en entreprise, avec un investissement de 150 millions de dollars.

OpenAI Business

SIG

HYP

Reddit r/LocalLLaMA·15 juin

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

Outil personnel d'agent hybride : planification avec modèle frontier (Codex), exécution locale avec Qwen 3.6 27B sur dual RTX 3090. Architecture 3 niveaux (Planner/Local/Senior optionnel) pour minimiser coûts frontier tout en gardant capacités de raisonnement. Validation déterministe des tâches.

Agents IA Qwen Génération de code

SIG

HYP

Reddit r/LocalLLaMA·15 juin

moar QAT stuff and hairy ticks

Publication de modèles Gemma-4 quantifiés (12B et 31B) avec une méthode QAT améliorée basée sur Q4_0. L'auteur a développé un processus itératif de recherche d'erreur maximale en F16 surpassant l'imatrix, atteignant une KLD similaire à unsloth. Code PyTorch disponible sans restrictions.

Open source Benchmarks Génération de code

SIG

HYP

ActuIA·15 juin

Les États-Unis coupent l'accès aux modèles Fable 5 et Mythos 5 d'Anthropic : un précédent pour la souveraineté IA

Les États-Unis ont imposé à Anthropic de restreindre l'accès aux modèles Fable 5 et Mythos 5 pour les ressortissants étrangers. Anthropic a désactivé ces modèles pour l'ensemble des utilisateurs non-américains, établissant un précédent en matière de contrôle souverain des IA avancées.

Anthropic Régulation Business

SIG

HYP

Reddit r/LocalLLaMA·15 juin

UI/svg block rendering by ServeurpersoCom · Pull Request #24080 · ggml-org/llama.cpp

Pull request #24080 sur llama.cpp ajoute le rendu de blocs UI/SVG. La démonstration vidéo montre des capacités de rendu SVG intégrées au projet.

Llama Open source Outils

SIG

HYP

Hacker News (AI)·15 juin

Show HN: AwsmAudio – a WebAudio editor with native MCP

AwsmAudio est un éditeur WebAudio intégrant le protocole MCP natif. Projet présenté sur Hacker News avec engagement limité (3 points, 0 commentaires).

MCP Outils Open source

SIG

HYP

Reddit r/LocalLLaMA·15 juin

I made a private on-device LLM app for Android (notes + recall, nothing leaves the phone)

Développeur propose une app Android exécutant un LLM entièrement on-device pour prendre des notes et les interroger par IA. Aucune donnée ne quitte le téléphone. Recherche testeurs bêta (8GB+ RAM recommandé), gratuit, en closed testing Google Play.

Open source Outils RAG

SIG

HYP

Reddit r/LocalLLaMA·15 juin

I ported EXL3 to run well on Apple Silicon - PonyExl3

Portage d'EXL3 (codec haute qualité/faible RAM) sur Apple Silicon via Metal. M5 Max atteint ~600 tok/s prefill et ~38 tok/s génération (Qwen 27B), surpassant RTX 4090 sur certains benchmarks (68.5-80 tok/s decode). Repo GitHub avec résultats reproductibles.

Open source Génération de code Infrastructure

SIG

HYP

arXiv cs.AI·15 juin

Hyperdimensional computing for structured querying on tabular data embeddings

Approche utilisant Hyperdimensional Computing (HDC) et Holographic Reduced Representations pour embeddings de données tabulaires. Dérive des seuils de similarité interprétables pour requêtes structurées (égalité/inégalité), évalue sur deux datasets réels contre baseline EmbDI. HDC détecte fiablement les requêtes sans résultats.

Embeddings Recherche vectorielle Papers

SIG

HYP

arXiv cs.AI·15 juin

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

Une étude de cas sur la formalisation semi-autonome du théorème d'annulation de Grothendieck montre que les LLM ferment les trous de preuve mais produisent des formalisations non réutilisables. Après révision d'expert, les agents s'adaptent bien aux retours locaux mais échouent à concevoir des définitions et APIs robustes.

Raisonnement Génération de code Évaluations

SIG

HYP

arXiv cs.AI·15 juin

A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale

Système multi-agent pour traiter automatiquement les relevés de notes du secondaire. Architecture avec 4 agents spécialisés (reconnaissance de motifs, analyse sémantique, vision, orchestration) atteint 96,7% de précision sur 40 relevés réels de 13 États américains, 45 secondes par document.

Multi-agents Agents IA Vision

SIG

HYP

arXiv cs.AI·15 juin

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

MA-ProofBench est le premier benchmark formel dédié aux théorèmes de Mathematical Analysis avec 200 problèmes formalisés en deux niveaux de difficulté (undergraduate et Ph.D.). GPT-5.5 atteint seulement 16% Pass@8 au niveau I et 5% au niveau II, révélant des lacunes majeures dans le raisonnement formel avancé des LLMs.

Benchmarks Raisonnement GPT

SIG

HYP

arXiv cs.AI·15 juin

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

VeriGeo génère des problèmes de géométrie contrôlables via des traces de raisonnement exécutables. Un agent Auteur crée le problème et le diagramme selon les contraintes utilisateur, un agent Solveur produit la preuve. Un pipeline à trois étapes vérifie la cohérence numérique, analytique et globale. Fine-tuning sur 8.7k exemples atteint les meilleures performances GeoQA et résultats forts sur PGPS9K et MathVista-GPS.

Raisonnement Vision Benchmarks

SIG

HYP

arXiv cs.AI·15 juin

TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards

TwinBI est un framework d'agent numérique qui couple un système LLM avec l'état exécutable d'un tableau de bord BI. Il unifie interaction conversationnelle, manipulation de dashboard et suivi de provenance via un log d'interaction partagé. Benchmark : précision exacte 43.3% → 63.3%, timeout 40% → 10%.

Agents IA RAG Benchmarks

SIG

HYP

arXiv cs.LG·15 juin

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

GTBP (Graph-based Target Back-Propagation) est un framework d'adaptation de contexte pour systèmes multi-LLM agentic. Il propage des cibles locales rétroactivement dans un graphe acyclique dirigé et met à jour les prompts par étapes. Convergence garantie théoriquement, surpasse les baselines sur 3 benchmarks.

Agents IA Multi-agents Prompt engineering

SIG

HYP

juin 2026

Oups… Amazon a dévoilé le Pixel Drop de Google avant l’heure

Vercel Functions can now run up to 30 minutes

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

Vous utilisez Claude ? Anthropic pourrait bientôt vous demander une preuve d’identité

India, UAE partner on AI sovereignty to bypass Google, Microsoft

Show HN: Can Europe train a frontier AI model on the compute it owns?

Pokémon Go data helped train AI now linked to military drones

I implemented 10 core ML algorithms from scratch with NumPy. Here's what no tutorial taught me [P]

DXC et Anthropic apportent l’IA aux systèmes critiques d’entreprise

React Native ExecuTorch now runs Gemma 4 (Vulkan and MLX accelerated)

Pemba, le premier robot humanoïde qui veut gravir le mont Everest

OpenAI acquiert Ona pour renforcer les agents IA de Codex

I got tired of juggling OpenRouter + Artificial Analysis + Design Arena tabs to pick a model, so I put them in one filterable table

PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

Ce fou furieux tente de recréer GTA 6 de A à Z… uniquement avec une IA

Anthropic shutdown sparks sovereignty debate across Europe

I'm still surprised on how good the kv quantization has become

*Lower* generation speed with H100 and H200 than with RTX 5090?

Microsoft CEO Satya Nadella warns of "a small number of AI systems capturing all the economic returns"

Le FBI s’est construit sa propre petite ville… juste pour se faire hacker

Mistral serait valorisée 20 milliards d’euros après une levée de 3 milliards

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b

OpenAI Partner Network : un réseau pour industrialiser l’IA

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

moar QAT stuff and hairy ticks

Les États-Unis coupent l'accès aux modèles Fable 5 et Mythos 5 d'Anthropic : un précédent pour la souveraineté IA

UI/svg block rendering by ServeurpersoCom · Pull Request #24080 · ggml-org/llama.cpp

Show HN: AwsmAudio – a WebAudio editor with native MCP

I made a private on-device LLM app for Android (notes + recall, nothing leaves the phone)

I ported EXL3 to run well on Apple Silicon - PonyExl3

Hyperdimensional computing for structured querying on tabular data embeddings

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

A Multi-Agent AI System for Automated High School Transcript Processing: Collaborative Document Analysis at Scale

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

The Coin Flip Judge? Reliability and Bias in LLM-as-a-Judge Evaluation

Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

LoSoNA: A Benchmark for Local Social Norm Adaptation in Group Conversations

AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

Which Models Perform Better in Inheritance Reasoning?

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

MoDiCoL: A Modular Diagnostic Continual Learning Dataset for Robust Speech Recognition

Learning High Coverage Discriminative Parsimonious Rulesets

Efficient On-Device Diffusion LLM Inference with Mobile NPU

A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

Does the Judge Prefer English? Evaluating Language-Switching Invariance in LLM-as-a-Judge

OdysSim: Building Foundation Models for Human Behavior Simulation

Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

Fodor and Pylyshyn's Systematicity Challenge Still Stands

Benchmarking Web Agent Safety under E-commerce Deceptive Interfaces

A fully GPU-based workflow for building physics emulators of hypersonic flows

QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

Harsher on Male? Evaluating LLMs on Gender-Asymmetric Moral Framing Across Diverse Conflict Scenarios

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

Implicit Reasoning for Large Language Model-based Generative Recommendation

The Holistic Storage of Verb+Up Phrases in Text-based and Audio-based Language Models

Non-Parametric Machine Text Detection via Multi-View Gaussian Processes

Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

Fusing Stylometric and Embedding Systems to Estimate Authorship Likelihood Ratios in Japanese

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling

When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding

The Culture Funnel: You Can't Align What isn't in the Data

MedLatentDx: Latent Multi-Agent Communication for Cross-Hospital Rare-Disease Diagnosis

Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

Lower generation speed with H100 and H200 than with RTX 5090?