June 2026

2731 articles

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

Novel LLM personalization: store user facts as surgical edits in a hash-keyed memory table (Engram) instead of global LoRA. Reduces memory footprint by 33,000x, improves indirect-reasoning accuracy by 5.6x on average, and enables stacking multiple users without cross-contamination.

Fine-tuning Reasoning Papers

SIG

HYP

Reddit r/LocalLLaMA·Jun 18

CEOs of Anthropic and Google DeepMind call for U.S.-led AI coalition in meeting at G7

Dario Amodei (Anthropic) and Demis Hassabis (Google DeepMind) called for a U.S.-led AI coalition at a G7 meeting. Both executives advocated for international coordination amid geopolitical AI challenges.

Anthropic DeepMind Regulation

SIG

HYP

Hacker News (AI)·Jun 18

[x86] AI Compute Extensions (ACE) Specification

Intel releases x86 AI Compute Extensions (ACE) specification, an instruction set extension to accelerate AI workloads on x86 processors. Technical details and implementation guidance available in official documentation.

Infrastructure Benchmarks

SIG

HYP

Reddit r/MachineLearning·Jun 18

Open-Source Hong Kong Horse Racing ML Pipeline — Feedback Welcome [P]

Open-source ML pipeline for Hong Kong horse racing prediction (HKJC). Uses LightGBM/XGBoost with out-of-sample validation, betting simulations (Quinella, Tierce, Quartet), and Kelly Criterion. Key finding: no-odds model outperforms with-odds model on Quinella ROI, suggesting mispricing in certain combinations.

Open source Benchmarks Tools

SIG

HYP

Hacker News (AI)·Jun 18

Noam Shazeer is joining OpenAI

Noam Shazeer, Character.AI co-founder and former Google researcher, joins OpenAI. No details on role or responsibilities disclosed.

OpenAI

SIG

HYP

Simon Willison·Jun 17

GLM-5.2 is probably the most powerful text-only open weights LLM

Z.ai released GLM-5.2 (753B parameters, 40 active via MoE) under MIT license on June 16th. Text-only model with 1M token context window. Ranks 1st on Artificial Analysis Intelligence Index v4.1 (score 51) ahead of DeepSeek V4 Pro and Kimi K2.6. 2nd on Code Arena WebDev behind Claude Fable 5.

Open source Benchmarks Code generation

SIG

HYP

Hacker News (AI)·Jun 17

License Plate Cameras Will Soon Track Phones, Wearables, Infotainment and Pets

License plate cameras will soon track phones, wearables, infotainment systems and pets via Bluetooth and WiFi. Mass surveillance technology in development.

AI safety Regulation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

llama.cpp now supports model management (downloading etc) via API

llama.cpp merges PR #23976 adding model management via API. On-demand downloading, loading, and unloading from directory. UI coming soon. Full lifecycle deployment and management through API alone.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

Inflect-Nano-v1, a 4.63M parameter TTS model, is the 2nd smallest publicly released speech synthesis model. Comprises acoustic model (3.46M) and vocoder (1.17M), generates 24 kHz English audio. ~17x smaller than Kokoro, ~108x smaller than Chatterbox. Runs locally via PyTorch, suited for embedded devices and offline voice assistants.

Voice Open source Tools

SIG

HYP

Hacker News (AI)·Jun 17

Leaked financial docs show OpenAI is losing billions of dollars a year

Leaked financial documents show OpenAI losing billions annually. Infrastructure and R&D costs exceed current revenue, raising questions about the business model's viability.

OpenAI Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Lin Junyang AI Lab Closes Round at $2B Valuation

Lin Junyang's AI lab closes funding round at $2B valuation. Lin Junyang, lead behind the Qwen line, launches new venture. Open source community expects significant contributions.

Qwen Open source Funding

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GLM 5.2 Release Video [Made with GLM 5.2]

GLM 5.2 generates videos via Remotion, comparable to Fable but below Gemini 3.1 Pro. Server overload observed on OpenRouter with timeouts on long outputs.

Video generation Gemini Qwen

SIG

HYP

Hacker News (AI)·Jun 17

I scored 200 blockchain NPM packages for deprecation and hijack risk

Security audit of 200 blockchain-related NPM packages: assessment of deprecation and hijack risks. Scoring methodology applied to critical dependency ecosystem.

AI safety Open source

SIG

HYP

Hacker News (AI)·Jun 17

The hacker sent by Anthropic to calm the government's nerves about AI safety

Anthropic deploys a security expert to government officials to address AI safety concerns. The move aims to establish direct dialogue between the company and regulators on safety and alignment issues.

Anthropic AI safety Regulation

SIG

HYP

The Decoder·Jun 17

Microsoft researcher builds a working neural network out of goats in Age of Empires II to critique AI science

A Microsoft researcher built a working neural network using goats in Age of Empires II's map editor to critique AI research methods. His analysis of 315 papers found over 50% presuppose language models have human-like traits before the experiment begins.

Papers Alignment Evals

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

US refrains from blacklisting DeepSeek but designates over 100 Chinese firms as security risks. Policy decision amid US-China tech and trade tensions.

DeepSeek Regulation Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

PSA: unsloth/GLM-5.2-GGUF is uploading

Unsloth created a HuggingFace repository for GLM-5.2 GGUF 30 minutes ago. Only the README is currently available; GGUF files are suspected to be uploading.

Open source Tools

SIG

HYP

Reddit r/MachineLearning·Jun 17

Contrastive targeted SFT as a mechinterp method - has anyone mapped causal dependency interactions this way? [D]

Researcher experiments with iterative targeted SFT combined with mechanistic interpretability on a 31B model. Strategy: contrastive training on specific capability dimensions, then circuit ablation to map causal dependencies between dimensions and optimize future training order.

Fine-tuning Reasoning Evals

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

i post-trained a model to reliably roll a die

A user post-trained a model to reliably simulate a die roll (each face ~1/6), exposing that frontier LLMs (Claude, GPT, Kimi) consistently answer '4'. Uses this toy problem to explore exploration vs. exploitation in RL and model behavior.

Reinforcement learning Claude GPT

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

llama.cpp - how to free up even more space on your GPU

llama.cpp optimizes GPU memory management. Key parameters: --no-mmproj-offload frees 1GB for vision models, --cache-type-k/v reduces KV cache by 50-75%, --spec-draft-n-max=2 optimizes speculative decoding. Flash attention enabled by default. Tested on Qwen 3.6-27B with 150k context on RTX 3090.

Llama Open source Infrastructure

SIG

HYP

The Decoder·Jun 17

Amazon, Nvidia, and AMD bet $310 million on AI startup building 3D world models

Amazon, Nvidia, and AMD invest $310 million in Odyssey ML, a 3D world model startup valued at $1.45 billion. IQT fund and Google's Jeff Dean join the round. World models are emerging as the next major AI bet after language models.

Funding Reasoning Vision

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

We built an open source UI kit for document RAG/agents

Extend releases an open source UI kit (MIT) for document RAG and agents: 15 components for PDF, DOCX, XLSX viewers with bounding box citations, file upload, e-signature. Built internally, tested on millions of pages/day, actively maintained.

RAG AI Agents Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

Docker deployment config for GLM-5.2-FP8 on HGX-H200 using SGLang. Achieves 70 tokens/s and 262k context by disabling DP and moe-a2a-backend deepep, with mem-fraction-static set to 0.83. Official vLLM recipes incompatible with H200.

Qwen Code generation Infrastructure

SIG

HYP

Latent Space·Jun 17

🔬 The Self-Driving Lab — Joseph Krause, Radical AI

Joseph Krause (Radical AI) argues that competitive advantage in materials science lies in the automated lab, not the AI model. Experimental capabilities and physical infrastructure form the true moat.

AI Agents Robotics

SIG

HYP

Hacker News (AI)·Jun 17

AI chemist improves a challenging reaction in medicinal chemistry

An AI chemist system optimizes a challenging reaction in medicinal chemistry. The approach combines predictive modeling and automated experimentation to improve synthesis yields.

Benchmarks Tools

SIG

HYP

The Decoder·Jun 17

Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons

Zhipu AI releases GLM-5.2 under MIT license with stable 1-million-token context. On FrontierSWE benchmark for long-duration coding tasks, the open-source model trails Anthropic's Claude Opus 4.8 by just one percentage point. Significant gap remains on reasoning versus closed-source rivals.

Open source Code generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

LoopCoder-V2 is a 7B code model based on Parallel Loop Transformer (PLT) that improves test-time performance through two passes of shared Transformer blocks. Trained on 18T tokens of mixed text/code data, it reaches 64.4 on SWE-bench Verified (vs 43.0 baseline), with two loops as the optimal gain-cost setting.

Code generation Reasoning Benchmarks

SIG

HYP

Simon Willison·Jun 17

Quoting Charity Majors

Charity Majors observes that in 2025, the economics of code production flipped: generating code became nearly free and instant instead of expensive and time-consuming. Lines of code shifted from being treasured and carefully curated to disposable and regenerable overnight.

Code generation Prompt engineering

SIG

HYP

Hacker News (AI)·Jun 17

Only 16 Percent of Americans Think AI Will Have a Positive Impact on Society

Poll: Only 16% of Americans believe AI will have a positive societal impact. Majority expresses concerns about economic and social effects, while experts remain more optimistic.

AI safety Regulation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B runs in-browser at 255 tokens/sec using WebGPU kernels optimized by Fable 5. Demo and kernels released on Hugging Face.

Gemini Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Bench evaluates whether AI agents can build playable games end-to-end in a real game engine. Benchmark tests Opus-4.7, GPT-5.5, Kimi-K2.6, DeepSeek-V4-Pro and others. No results reported for medium-sized models (27B-31B).

AI Agents Benchmarks Code generation

SIG

HYP

Le Big Data·Jun 17

Après cinq ans d’attente, Google officialise sa nouvelle enceinte connectée

Google officially announces a new smart speaker after a five-year absence from the market. The product will be available within days.

DeepMind

SIG

HYP

Hacker News (AI)·Jun 17

Launch HN: Adam (YC W25) – Open-Source AI CAD

Adam is an open-source AI-powered CAD software launched by a YC W25 startup. The project aims to automate computer-aided design through AI models.

Open source Tools Code generation

SIG

HYP

Vercel AI Blog·Jun 17

Vercel Ship 2026 recap

Vercel unveils agent-first infrastructure at Ship 2026 in London. Three core components: Agent Stack (building blocks for agents), Vercel Connect (secure external tool access without persistent tokens), and eve (open-source framework for production agents with durable execution, sandboxed compute, approvals, and evals).

AI Agents Infrastructure Tools

SIG

HYP

Hugging Face Blog·Jun 17

MolmoMotion: Language-guided 3D motion forecasting

Hugging Face introduces MolmoMotion, a language-guided 3D motion forecasting model. The system combines vision and language to predict future trajectories from videos, enabling applications in robotics and animation.

Vision Robotics

SIG

HYP

Hacker News (AI)·Jun 17

Agentic coding deserves more than a chat box bolted onto VS Code

Critical take on agentic coding integration in VS Code as a simple chat interface. Author argues current tools lack depth to leverage agentic systems' potential and require architectural redesign of editors.

AI Agents Code generation Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

TRELLIS.2 now runs natively on MLX (Image to 3d object model)

Native MLX port of Microsoft's TRELLIS.2 for Apple Silicon. Image-to-3D object generation at 512×512 (~70s) and 1024×1024 (~300-700s) on M4 Max. GitHub repo released.

Open source Tools Infrastructure

SIG

HYP

Reddit r/MachineLearning·Jun 17

I deployed a GAN on a Raspberry Pi 4 and built a physical NFT minting device [P]

DCGAN 128×128 deployed on Raspberry Pi 4 with ESP32 display. Model trained 800 epochs on M3 (4h), 2480 images, exported to ONNX (53MB). Inference 3s per face. Generates hybrid faces with randomized titles. Presented as street art installation in NYC.

Image generation Open source Tools

SIG

HYP

The Decoder·Jun 17

Nvidia research shows robots that train themselves through AI coding agents

Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley use AI coding agents to teach robots dexterous grasping in real-world conditions. A fleet of eight robots achieves 99% success rate on complex tasks.

AI Agents Code generation Robotics

SIG

HYP

The Decoder·Jun 17

OpenAI researchers want to predict how often AI models will fail before launch

OpenAI researchers propose a method to predict how often a new AI model will make mistakes after release. This approach could fill gaps left by standard safety testing.

OpenAI Evals AI safety

SIG

HYP

Hacker News (AI)·Jun 17

AI demands more engineering discipline. Not less

Article arguing for increased engineering discipline in AI development, against trends minimizing technical standards. Criticizes 'move fast and break things' approach applied to critical systems.

AI safety Alignment

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Making budget models punch above their weight with a smart Rust harness

A Rust developer optimizes small language models through efficient system architecture. A Rust harness improves inference performance without modifying model weights, enabling budget models to compete with larger versions.

Open source Infrastructure Tools

SIG

HYP

Le Big Data·Jun 17

DeepSeek réalise une levée géante de plus de 7 milliards de dollars

DeepSeek closes a funding round exceeding $7 billion, among the largest in the AI sector. Record amount for the Chinese startup specializing in language models.

DeepSeek Funding Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GLM-5.2 is a win for local AI

GLM-5.2 (744B) under MIT license marks progress for local AI despite its massive footprint. The community can distill its reasoning capabilities into 8B/70B models, significantly improving local setups.

Open source Fine-tuning Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C

A local Qwen 27B agent completed a raytraced FPS demo in pure C using headless screenshot loops for visual debugging. Adding headless mode with keyboard/mouse injection and frame capture transformed the approach: the model learned to automate recursive visual debugging loops independently.

Qwen AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

Developer releases local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects. LLM handles dialogue, narration, and quest progression; game system manages inventory, combat, and saves. Generated elements are stored and reusable.

Open source Tools AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

SIQ-1 Qwen3.6: PPO fine-tuning of Qwen-35B-A3 outperforming GLM-5.2 and Qwen-350B on autoresearch (karpathy benchmark) and bullshit-bench. Model + GGUF available on HuggingFace with demo agent.

Qwen Reinforcement learning AI Agents

SIG

HYP

Hacker News (AI)·Jun 17

Sixty percent of US consumers say 'AI' in brand messaging is a turnoff

60% of US consumers find the term 'AI' in brand messaging off-putting. The study reveals fatigue with keyword oversaturation lacking concrete added value.

Business

SIG

HYP

Le Big Data·Jun 17

Streaming : Fox rachète Roku pour 22 milliards de dollars

Fox acquires Roku for $22 billion, strengthening its position in video streaming. The purchase provides access to a major content distribution platform.

Business

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> google-research /</span> timesfm

TimesFM is a pretrained foundation model developed by Google Research for time-series forecasting. The GitHub repository provides an open-source implementation of this specialized model.

DeepMind Open source Benchmarks

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> yairm210 /</span> Unciv

Unciv is an open-source Android/Desktop remake of Civilization V. Community-driven project with no official affiliation to Firaxis Games.

Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> DeusData /</span> codebase-memory-mcp

High-performance code intelligence MCP server. Indexes codebases into persistent knowledge graph in milliseconds. Supports 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

MCP Code generation RAG

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> bytedance /</span> UI-TARS-desktop

ByteDance releases UI-TARS-desktop, an open-source multimodal AI agent stack. The project connects cutting-edge AI models and agent infrastructure to automate UI-based tasks.

AI Agents Multi-agent Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> calesthio /</span> OpenMontage

OpenMontage is an open-source, agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills. Converts an AI coding assistant into a full video production studio.

AI Agents Multi-agent Video generation

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> continuedev /</span> continue

Continue is an open-source coding agent featured on GitHub Trending. The project provides a software development assistance solution.

AI Agents Code generation Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Lampese /</span> codex-switcher

Lampese/codex-switcher is a desktop application for managing multiple OpenAI Codex CLI accounts. Open-source tool enabling account switching.

OpenAI Code generation Tools

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> bytedance /</span> UI-TARS-desktop

ByteDance releases UI-TARS-desktop, an open-source multimodal AI agent stack connecting cutting-edge AI models and agent infrastructure. Platform for building agents capable of interacting with user interfaces.

AI Agents Multi-agent Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> continuedev /</span> continue

Continue is an open-source coding agent featured on GitHub Trending. The project provides an automated development assistance solution.

AI Agents Code generation Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> openobserve /</span> openobserve

OpenObserve is an open-source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. Alternative to Datadog/Splunk/Elasticsearch with 140x lower storage costs and single binary deployment.

Open source Infrastructure Tools

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> infiniflow /</span> ragflow

RAGFlow is an open-source RAG engine combining retrieval-augmented generation with agent capabilities to create a superior context layer for LLMs.

RAG AI Agents Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> microsoft /</span> RD-Agent

Microsoft releases RD-Agent, an autonomous AI system to automate R&D processes in data science and ML. The agent drives experiments, data analysis, and model iterations without human intervention.

AI Agents Multi-agent Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> calesthio /</span> OpenMontage

OpenMontage is an open-source, agentic video production system with 12 pipelines, 52 tools, and 500+ agent skills. Converts an AI coding assistant into a full video production studio.

AI Agents Multi-agent Video generation

SIG

HYP

The Decoder·Jun 17

Hyperscalers may soon be unable to fund their AI buildout from cash flow alone

Per Epoch AI analysis, Microsoft, Amazon, Alphabet, Meta, and Oracle are growing AI infrastructure spending at ~70% annually while operating cash flow rises only 23%. Spending could exceed cash flow by Q3 2026. Several hyperscalers are already pursuing outside funding.

Business Infrastructure

SIG

HYP

Hugging Face Blog·Jun 17

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

Hugging Face and Strands integrate Hub models with LeRobot to deploy AI agents on robot hardware. The platform enables developers to use pre-trained models to control physical robots directly.

AI Agents Robotics Open source

SIG

HYP

OpenAI Blog·Jun 17

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

OpenAI and Molecule.one demonstrate that a near-autonomous AI chemist using GPT-5.4 improved a key reaction in medicinal chemistry, optimizing a pharmaceutical synthesis process.

GPT OpenAI AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Local models went from mostly useless to actually useful really fast. What changed?

Local models shifted from marginal tools to viable solutions in one year. Gemma, Qwen, GLM, Kimi now replace some API calls for coding, private documents, and local workflows, though gaps remain on complex tasks requiring planning and error correction.

Llama Open source Qwen

SIG

HYP

Hacker News (AI)·Jun 17

Show HN: I built 184 free browser tools – PDF, image, dev, AI tasks, no upload

Developer built 184 free browser-based tools covering PDF, image, dev, and AI tasks with no server file uploads.

Tools Open source

SIG

HYP

Le Big Data·Jun 17

HSBC et Google Cloud scellent un partenariat pour l’IA bancaire

HSBC and Google Cloud announce a multi-year partnership to deploy AI in production across banking operations. The deal covers industrializing AI solutions on Google's cloud infrastructure.

Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

A Year Building a Fully Local Home Voice Assistant · Fulloch

Developer shares 12-month journey building a fully local home voice assistant using open-source models as Alexa alternative. Documents what worked and what didn't throughout the project.

Open source Voice AI Agents

SIG

HYP

Hugging Face Blog·Jun 17

GLM-5.2: Built for Long-Horizon Tasks

Hugging Face announces GLM-5.2, a model designed for long-horizon tasks. The model improves capacity to handle extended contexts and complex multi-step workflows.

DeepMind Reasoning Benchmarks

SIG

HYP

Reddit r/MachineLearning·Jun 17

Next-Latent Prediction Transformers [R]

Microsoft Research presents Next-Latent Prediction (NextLat), a self-supervised learning method where transformers predict their own next latent state. This improves history compression into compact belief states, data efficiency, and accelerates inference up to 3.3x via recursive speculative decoding.

Reasoning Reinforcement learning Papers

SIG

HYP

Le Big Data·Jun 17

Grok Imagine Video 1.5 : cette IA génère maintenant des vidéos avec le son

xAI makes Grok Imagine Video 1.5 accessible, its video generation model now capable of producing videos with synchronized audio.

Video generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Rio 3.5 397B, funded at ~$100K USD, turns out to be a simple merge of models (Nex N2 Pro) without additional training, contrary to initial claims of Qwen 3.5 397B improvements. After discovery, the team changed documentation and claims the trained model was lost, raising embezzlement suspicions.

Open source Qwen

SIG

HYP

Reddit r/MachineLearning·Jun 17

What is Speculative Decoding? (trending on paperswithco.de) [R]

Speculative Decoding is an inference optimization technique using a fast, small draft model to propose multiple future tokens, verified in parallel by a larger target model. SGLang published a blog detailing state-of-the-art latencies for LLM inference serving with Modal and Z.ai's DFlash speculative decoding models.

Benchmarks Infrastructure

SIG

HYP

Le Big Data·Jun 17

De nouveaux kits Shure pour moderniser Zoom Spaces

Shure and Zoom release new kits for Zoom Spaces, integrating AI into meetings. Solutions modernize collaborative experience in meeting rooms.

Tools

SIG

HYP

Latent Space·Jun 17

[AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding

GLM-5.2 becomes the top open-source model for frontend coding. Zhipu AI also announces IndexShare, a speculative decoding technique to accelerate inference.

Code generation Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·Jun 17

Mel AI just shared a demo of video-native AI characters that can talk, react, and respond to camera context in real time [N]

Mel AI demonstrates video-native AI characters that talk, lip-sync, show facial reactions, and respond in real time to camera context. The system detects user environment and adapts responses accordingly. This approach moves beyond text-based Character AI (founded by former Google/LaMDA developers).

AI Agents Vision Voice

SIG

HYP

Vercel AI Blog·Jun 17

Introducing Vercel Connect

Vercel Connect, now in Public Beta, replaces long-lived stored tokens with runtime credential exchange. Agents receive short-lived, task-scoped credentials through reusable connectors (Slack, GitHub, etc.), eliminating risks from permanent token leaks.

AI Agents Tools Infrastructure

SIG

HYP

arXiv cs.AI·Jun 17

DiagFlowBench: Evaluating How Language Models Handle Off-Procedure Inputs in Grounded Diagnostic Dialogue

DiagFlowBench evaluates how language models handle off-procedure inputs in industrial diagnostic dialogue. A dataset of 1,676 multi-turn conversations derived from 50 diagnostic flowcharts reveals models often select a real but contextually inadequate step rather than hallucinate, exposing a vulnerability: plausible but wrong advice grounded in documentation.

Benchmarks Evals Reasoning

SIG

HYP

arXiv cs.AI·Jun 17

MathVis-Fine: Aligning Visual Supervision with Necessity via Progressive Dependency-Guided Training for Multimodal Mathematical Reasoning

MathVis-Fine introduces a framework for fine-grained visual dependency modeling in mathematical reasoning. A new dataset augments visual annotations with visual dependency ratings. Two-stage progressive training balances answer correctness and visual grounding rewards according to each sample's intrinsic visual necessity, reducing reward bias.

Reasoning Vision Benchmarks

SIG

HYP

arXiv cs.AI·Jun 17

How Inference Compute Shapes Frontier LLM Evaluation

Study evaluating 12 frontier models on inference compute impact across seven benchmarks. Three interventions tested: larger token budgets, context compaction, repeated submission attempts. Results: increased budgets substantially improve performance on FrontierMath, Humanity's Last Exam, TerminalBench. Fixed-budget evaluations increasingly understate newer model capabilities.

Benchmarks Evals Reasoning

SIG

HYP

arXiv cs.CL·Jun 17

Speaking in Self-Assessing Tongues: On the Verbalized Confidence of LLMs in Machine Translation

Study of LLM verbalized confidence reliability in machine translation. Five methods for extracting per-token confidence without internal signal access are compared against predicted probabilities. Results: similar performance for error detection and calibration, but little correlation between internal and verbalized methods.

Evals Reasoning

SIG

HYP

arXiv cs.CL·Jun 17

From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities

Analysis of 4,434 posts and 50,338 comments on Moltbook showing parasocial interaction cues (intimacy language, reciprocity bids, self-identification) persist in autonomous AI-agent communities. Results validated through keyword matching and LLM annotation reveal strong association between these signals and original poster re-engagement and sustained dyadic patterns.

AI Agents Multi-agent Papers

SIG

HYP

arXiv cs.CL·Jun 17

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

AIPatient Arena evaluates LLMs in multi-turn clinical consultation across 8 competence dimensions using EHR-grounded knowledge graphs. On 437 patients, models excel in questioning (4.43-4.99/5) and ethical conduct (4.38-4.93/5), but fail in diagnostic accuracy (2.63-3.55/5) and information coverage (2.08-3.02/5). Weaknesses include repetitive questioning, omitted medical history, inadequate uncertainty handling.

Evals Reasoning AI safety

SIG

HYP

arXiv cs.AI·Jun 17

EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent

EComAgentBench is a benchmark of 662 e-commerce tasks evaluating LLM-based shopping agents on hidden intents distributed across query, user profile, and clarifications. Requirements are scattered and agents must uncover them within 100 tool calls. The strongest model achieves only 57.1% accuracy.

AI Agents Benchmarks Evals

SIG

HYP

arXiv cs.AI·Jun 17

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

Researchers propose Equation-to-Behavior Prompting to guide LLMs to simulate diverse cognitive models (Bayesian, motivated reasoning, Grether's α-β model). Large models approximate these specifications via prompting, but small models fail. RL training reduces belief error by 26.5% and improves performance by 2.5–12% on legal persuasion games.

Reasoning Reinforcement learning Evals

SIG

HYP

arXiv cs.AI·Jun 17

DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQL

DecoSearch is a training-free framework for text-to-SQL translation that routes queries by complexity. A schema selector prunes the database, an LLM judger decides if decomposition is needed, and a DAG solves atomic sub-questions. Achieves 70.53% on BIRD and 88.31% on Spider with DeepSeek, outperforming training-free baselines.

Code generation Reasoning RAG

SIG

HYP

arXiv cs.AI·Jun 17

Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow

Foundation model-orchestrated workflow for pedestrian protection design. Integrates ML surrogate (R²=0.87), multi-objective evolutionary search, geometry generator, and LLM interface. Reduces evaluation time from hours to seconds; generates 35 safety-compliant alternatives in automotive bumper case study.

AI Agents Vision Reasoning

SIG

HYP

arXiv cs.AI·Jun 17

SEAGym: An Evaluation Environment for Self-Evolving LLM Agents

SEAGym is an evaluation environment for measuring self-evolving LLM agent harness updates (prompts, memory, tools, interaction loop). The study compares ACE, TF-GRPO, and AHE on Terminal-Bench 2.0 and HLE, showing frequent updates don't guarantee held-out performance gains and source diversity affects harness reliability.

AI Agents Reinforcement learning Evals

SIG

HYP

arXiv cs.AI·Jun 17

Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema Classification

Brick-DICL introduces a two-stage dynamic in-context learning framework for automated Brick schema classification of BMS points (936 classes). Combines metadata-RAG and class-RAG to enhance LLM domain knowledge, with multi-LLM filtering to reduce manual verification effort.

RAG Prompt engineering Reasoning

SIG

HYP

arXiv cs.AI·Jun 17

MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors

MapSatisfyBench is a benchmark for evaluating LLM agents integrated into map services. It measures their ability to identify and satisfy implicit user needs (unspoken decision factors) from real-world behavioral data. Experiments show current agents perform well on explicit task completion but struggle to proactively address implicit factors.

AI Agents Benchmarks Evals

SIG

HYP

arXiv cs.AI·Jun 17

Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes

arXiv paper proposing architecture for distributed peer-to-peer autonomous agent networks. Authors identify three core mechanisms: semantic announcement propagation for collaborator discovery, verifiable identity and multi-topic reputation (MG-EigenTrust), and mechanism design for open task execution. Prototypes and simulations presented.

AI Agents Multi-agent Papers

SIG

HYP

arXiv cs.AI·Jun 17

Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation

CEO-Bench, a multi-agent benchmark, evaluates LLMs' ability to make strategic resource reallocation decisions. Five frontier models tested on 13 scenarios show high structural validity but diverge on strategic calibration. Failure modes include single-advisor capture and historical amnesia.

AI Agents Multi-agent Reasoning

SIG

HYP

arXiv cs.AI·Jun 17

A homotopy-type-theoretic generalization of neurosymbolic inference

Theoretical paper generalizing neurosymbolic systems using homotopy type theory. The framework preserves symmetry and multiple-proof information, converting classical functionals into belief-weighted homotopy cardinalities. Validated on MNIST reasoning-shortcut benchmarks with better calibration than diversity-trained ensembles.

Reasoning Papers

SIG

HYP

arXiv cs.AI·Jun 17

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

SpeechDx is a multi-task benchmark for clinical speech AI covering 12 datasets and 27 tasks across diverse health conditions. Tasks are structured by speech production stages (conceptualization, formulation, articulation). Evaluation of 12 audio encoders shows large-scale speech models outperform domain-specific ones, but none generalize reliably across clinical speech.

Benchmarks Voice Evals

SIG

HYP

Vercel AI Blog·Jun 17

Introducing eve

Vercel introduces eve, an open-source agent framework for building and deploying agents in production. eve provides built-in infrastructure (model management, fallbacks, logging); developers define only behavior through files (agent.ts, instructions.md, tools). Inspired by Next.js for the web, eve standardizes agent building as Next.js did for web applications.

AI Agents Open source Tools

SIG

HYP

arXiv cs.LG·Jun 17

Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization

Theoretical paper on Sum-of-Squares degree barriers for robust halfspace learning under malicious noise. The Christoffel function exactly characterizes corruption hidden from bounded-degree certificates. Proves a margin-degree tradeoff and a degree-2t algorithm achieving the frontier η^(1-1/2t).

Papers Reasoning AI safety

SIG

HYP

arXiv cs.AI·Jun 17

Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

DivInit improves test-time scaling for agentic search by diversifying initial queries. Instead of sampling k independent queries in parallel, the method generates n candidates then selects k diverse seeds. Gains of 5-7 points on multi-hop QA at matched compute, validated across 5 open-weight models and 8 benchmarks.

AI Agents Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·Jun 17

Nothing from Something: Can a Language Model Discover 0?

Study on language models' ability to discover the mathematical concept of zero. GPT-2-sized models fail without additional training, but improve substantially after exposure to tens or hundreds of examples. Language pretraining reduces required examples by ~50%.

Reasoning Papers Benchmarks

SIG

HYP

arXiv cs.AI·Jun 17

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

SkillChain-Gym is a benchmark for reskilling-aware production-inventory control. The environment models skill decay, certification lapses, training actions, and capacity constraints. Evaluation of production-only, reactive adaptive, and static-insurance policies over 60-shift horizons with operational and resilience metrics.

Benchmarks Reinforcement learning AI Agents

SIG

HYP