May 2026

3149 articles

Anthropic's run-rate revenue hits $47 billion

Anthropic reports annualized run-rate revenue of $47 billion as of May 2026, up from $9 billion end-2025. Growth accelerates: $14 billion in February, $30 billion in April. Metric disclosed during $65 billion Series H funding round.

Anthropic Business Funding

SIG

HYP

Hacker News (AI)·May 29

Microsoft data suggests using AI is more expensive than hiring people

Microsoft internal data suggests AI usage for certain tasks costs more than hiring human workers. The article raises questions about the actual ROI of enterprise AI deployments.

Business

SIG

HYP

Reddit r/LocalLLaMA·May 29

StepFun 3.7 Flash

StepFun releases Step 3.7 Flash, a 196B/11B active MoE multimodal model with built-in 1.8B ViT. SWE-Bench Pro: 56.26% (beats DeepSeek V4 Flash 55.6%), DeepSearchQA F1: 92.82%. Runs locally on 128GB RAM.

Open source Code generation AI Agents

SIG

HYP

Hacker News (AI)·May 29

The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

An anonymous LLM model called Hy3 is topping OpenRouter's rankings by a large margin. Its identity and technical details remain unknown, raising questions about its origin and actual capabilities.

Benchmarks Llama

SIG

HYP

Reddit r/LocalLLaMA·May 29

Optimizing and accelerating the Lance model for RTX 2080 Ti 22GB (Tested on Single & Dual-GPU)

Lance model optimization for RTX 2080 Ti 22GB on single and dual-GPU setups. Custom operator configurations for Turing architecture, pipeline/tensor parallelism across 44GB combined VRAM, reproducible open-source scripts.

Open source Infrastructure Code generation

SIG

HYP

Vercel AI Blog·May 29

Run Docker containers inside Vercel Sandbox

Vercel Sandbox now supports installing and running Docker inside sandboxes without touching the host system. Enables testing containerized services like Redis/Postgres, validating container images before deployment, and previewing containerized applications. Also adds FUSE filesystem drivers and VPN client support.

Infrastructure Tools Code generation

SIG

HYP

Vercel AI Blog·May 29

Port 8080 is now available in Vercel Sandboxes

Vercel Sandboxes now allows binding port 8080 to an ingress domain. The controller port has been moved to port 23456.

Infrastructure Tools

SIG

HYP

OpenAI Blog·May 29

A shared playbook for trustworthy third party evaluations

OpenAI publishes guidance for third-party AI evaluations, covering assessment of model capabilities, safeguards, and validity for frontier systems.

OpenAI Evals AI safety

SIG

HYP

Hugging Face Blog·May 29

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Beginner's guide to PyTorch profiling using torch.profiler. Covers how to measure performance and identify bottlenecks in AI models, with practical examples for newcomers.

Tools Infrastructure

SIG

HYP

Simon Willison·May 28

Claude Opus 4.8: "a modest but tangible improvement"

Anthropic releases Claude Opus 4.8, described as a "modest but tangible improvement" over 4.7. The model excels in honesty: 4x less likely to let code flaws pass unremarked, and abstains more on uncertain questions. Pricing unchanged: $5/M input tokens, $25/M output.

Claude Anthropic Evals

SIG

HYP

Simon Willison·May 28

llm-anthropic 0.25.1

Release of llm-anthropic 0.25.1: adds Claude Opus 4.8 model, -o fast 1 option for fast mode (enabled organizations), and default max_tokens now matches each model's maximum output instead of 8192.

Claude Anthropic Tools

SIG

HYP

Reddit r/LocalLLaMA·May 28

Got searxng working on windows without docker/wsl

Reddit user shares method to run SearXNG (decentralized search engine) on Windows without Docker or WSL. Practical approach for local deployment.

Open source Tools Infrastructure

SIG

HYP

ActuIA·May 28

ByteDance prépare ses propres CPU Arm et RISC-V pour reprendre le contrôle du coût par token

ByteDance is developing custom Arm and RISC-V processors to reduce inference costs for its models. The group processes 120 trillion tokens/day with Doubao and aims to reduce Nvidia GPU dependency by optimizing server infrastructure.

Infrastructure Business

SIG

HYP

ActuIA·May 28

Claude Mythos : l'UE exclue du briefing qu'ont reçu la Fed et la Banque d'Angleterre

Anthropic granted operational access to Claude via Project Glasswing to the US Federal Reserve and Bank of England, but no EU institution has such access based on available information.

Claude Anthropic Regulation

SIG

HYP

Reddit r/LocalLLaMA·May 28

here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Streamlit app to benchmark yourself against open-source LLMs across 5 benchmarks. Share results on CV/LinkedIn. BBQ benchmark featured.

Open source Benchmarks Tools

SIG

HYP

Hacker News (AI)·May 28

Starbucks to Take AI Usage into Account in Tech Workers' Bonuses

Starbucks incorporates AI usage into bonus evaluations for tech workers. The coffee chain adjusts compensation policy to reward AI tool adoption among IT staff.

Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

Claude cli >= 2.1.154 breaks local use with vLLM by introducing "ctx", "msg" and "system" roles for API messages. This 1-line patch to vLLM fixes it.

Claude CLI >= 2.1.154 introduces "ctx", "msg", and "system" roles for API messages, breaking vLLM compatibility. A one-line patch in vLLM restores compatibility and enables Claude workflows with local models like MiniMax-M2.7.

Claude Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 28

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

Call for papers for 2nd Workshop on Social Simulation with LLMs (Social Sim'26) @ COLM 2026. Theme: "Fidelity in Applications". Deadline June 23, 2026. Focus on evaluation, robustness, interpretability, and empirical validation of LLM-based simulated societies.

AI Agents Multi-agent Evals

SIG

HYP

The Decoder·May 28

Claude company Anthropic nears a trillion-dollar valuation after raising $65 billion in Series H

Anthropic raises $65 billion in Series H at a $965 billion valuation. Annualized revenue reaches $47 billion according to CFO Krishna Rao. The company will invest in safety research, computing capacity, and expanding its Claude product lineup.

Claude Anthropic Funding

SIG

HYP

The Decoder·May 28

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Anthropic releases Claude Opus 4.8, outperforming GPT-5.5 and Gemini 3.1 Pro on most benchmarks. The model catches its own coding errors 4× better than its predecessor. Anthropic also rolls out dynamic workflows enabling hundreds of parallel sub-agents for codebase-wide migrations.

Claude Benchmarks Code generation

SIG

HYP

Hacker News (AI)·May 28

Amazon scraps AI leaderboard to stop workers chasing usage scores

Amazon discontinues internal AI leaderboard to prevent employees from optimizing for usage metrics instead of quality. The platform was driving a metrics-chasing behavior that diverted teams from actual business goals.

Business AI safety

SIG

HYP

ActuIA·May 28

Claude Opus 4.8 : Anthropic met l’accent sur un modèle plus honnête face à ses propres erreurs

Anthropic releases Claude Opus 4.8 on May 28, 2026. The model is reportedly four times less prone to errors, with emphasis on honesty regarding its own failures.

Claude Anthropic Reasoning

SIG

HYP

Hacker News (AI)·May 28

Show HN: Open Envelope – an open schema for defining AI agent teams

Open Envelope is an open schema for defining AI agent teams. The project proposes a standardized specification to compose and orchestrate multiple agents in collaborative workflows.

AI Agents Multi-agent Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

Mimo 2.5 Pro - 40t/s on 8x Nvidia Spark/GB10 cluster

Mimo 2.5 Pro achieves 40 t/s on 8x Nvidia GB10 cluster with 1k context, degrading to 17 t/s at 250k context. Parallelization: 60 t/s (2 requests), 83 t/s (4 requests). 1T model optimized via mtp-2.

Open source Infrastructure Benchmarks

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> EveryInc /</span> compound-engineering-plugin

Official Compound Engineering plugin for Claude Code, Codex, Cursor and other editors. Native integration to enhance development workflows.

Claude Code Code generation Tools

SIG

HYP

Simon Willison·May 28

markdown-svg-renderer

Markdown rendering tool with specialized support for fenced code SVG blocks. Renders the image and provides a tab to switch to code view. Accepts raw Markdown, CORS-enabled URLs, or Gists.

Tools

SIG

HYP

Hacker News (AI)·May 28

Sam Altman and Dario Amodei are both walking back AI jobs apocalypse predictions

Sam Altman and Dario Amodei (OpenAI and Anthropic) are walking back previous predictions of AI-driven job apocalypse. Both executives are moderating their rhetoric on massive AI impact on employment.

OpenAI Anthropic Business

SIG

HYP

Le Big Data·May 28

RAG (Retrieval-Augmented Generation) : une approche pour optimiser l’usage de l’IA

RAG (Retrieval-Augmented Generation) enhances language models by providing access to external data, reducing hallucinations and errors. This approach combines document retrieval and generation to optimize answer relevance.

RAG Llama

SIG

HYP

Reddit r/MachineLearning·May 28

I built a knowledge graph + policy engine for AI agents , explainable reasoning [D]

VeritasReason is an open-source Python framework adding structured reasoning and provenance layer to AI agents. It provides queryable context graphs, forward-chaining rule engine (YAML), W3C PROV-O provenance, and policy compliance checking. Works with OpenAI, Anthropic, Groq, Ollama.

AI Agents Reasoning Open source

SIG

HYP

Le Big Data·May 28

Le travail et le code dans une seule IA ? Voici Vibe, la nouvelle ambition de Mistral

Mistral launches Vibe, a unified AI capable of handling meetings, documents, and code in a single interface. The product aims to eliminate the need to switch between multiple specialized tools.

Mistral AI Agents Code generation

SIG

HYP

Latent Space·May 28

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

Cognition and OpenInspect showcase async agents as emerging paradigm: 80% of Devin commits from agents, automated spec-to-PR workflows, full VMs, persistent agent memory, and PMs shipping code directly.

AI Agents Code generation Tools

SIG

HYP

Hacker News (AI)·May 28

Show HN: Bootstrap a team of coding agents from a template, OSS

Open-source tool to bootstrap a team of coding agents from a template. Project shared on Hacker News with limited engagement (3 points, 0 comments).

AI Agents Multi-agent Code generation

SIG

HYP

Le Big Data·May 28

De Google Remy à Gemini Spark : l’avènement de l’agent IA autonome

Google is developing autonomous AI agents with Remy and Gemini Spark, replacing passive chatbots with tools capable of independent actions and increased productivity.

Gemini AI Agents DeepMind

SIG

HYP

Hacker News (AI)·May 28

Anthropic raises $65B in Series H funding at $965B post-money valuation

Anthropic raises $65 billion in Series H funding at $965 billion post-money valuation. This major funding round reflects continued investor confidence in the company's AI development trajectory.

Anthropic Funding Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

Granite 4.1 Architecture Changes?

A r/LocalLLaMA user questions IBM's decision to return to pure transformer architecture for Granite 4.1, abandoning Granite 4's hybrid mamba-attention design. On modest hardware (8GB VRAM), Granite 4 delivered 128k context at ~1000 tok/s ingestion, while Granite 4.1 caps at 14k context and ~300 tok/s. User asks whether IBM will continue offering mamba architecture.

Open source Reasoning

SIG

HYP

Reddit r/MachineLearning·May 28

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

AgingBench, a new longitudinal deployment benchmark, shows that swapping Claude Sonnet 4.6 for Opus 4.7 in the Claude Code CLI agent drops PyTest pass rate by ~15%. Memory policy alone drives a 4.5x spread in agent half-life across scenarios, larger than any model swap tested.

AI Agents Claude Claude Code

SIG

HYP

Hacker News (AI)·May 28

Dynamic Workflows in Claude Code

Claude Code now supports dynamic workflows, enabling users to create adaptive task sequences. The feature enhances automation and flexibility in AI-assisted coding processes.

Claude Code AI Agents

SIG

HYP

Reddit r/MachineLearning·May 28

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Wall-OSS-0.5 is a 4B VLA from X Square Robot with open training code. Zero-shot evaluation on 17 real-robot tasks: 4 tasks >80% progress, including Rope Tightening (82%). Post fine-tuning: 60.5% average task progress (+17.5pp vs pi0.5). Mixture-of-Transformers architecture with vision-aligned RVQ tokenizer and distributed DMuon optimizer.

Robotics Vision Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 28

LiquidAI/LFM2.5-8B-A1B · Hugging Face

LiquidAI releases LFM2.5-8B-A1B, a hybrid 8B model optimized for on-device inference (CPU/GPU). Extended architecture with reinforcement learning, compatible with llama.cpp/MLX/vLLM/SGLang. Performance competitive with larger models on agentic tasks and complex instruction following.

Open source AI Agents Code generation

SIG

HYP

Le Big Data·May 28

Windows sur ARM à 300 $ ? Qualcomm vient peut-être de réveiller le marché des PC

Qualcomm announces a Windows on ARM laptop at $300, positioning an affordable alternative to traditional laptops. The manufacturer promises a functional device for essential use cases.

Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 28

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS

Comparative test of Qwen 3.6 35B across output formats (raw text, markdown, HTML, HTML+CSS). Markdown achieves best quality (78/100 per ChatGPT-4o) with 1,496 output tokens in 23s. HTML+CSS generates 10,290 tokens in 82s but lower quality score (58/100). Measurements include reasoning tokens, throughput, and total time.

Qwen Code generation Prompt engineering

SIG

HYP

The Decoder·May 28

Google Cloud responds to AI-accelerated cyberattacks with a platform that aims to close security gaps in minutes

Google Cloud launches 'AI Threat Defense', a platform automating detection, assessment, and patching of security flaws in enterprise systems. It integrates technologies from acquisitions.

DeepMind AI safety Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

I built an enforcement layer for AI coding agents using a local knowledge graph and hybrid RAG

Writ is an enforcement layer for AI coding agents using a local Neo4j knowledge graph and hybrid RAG. A 5-stage retrieval pipeline (BM25, HNSW vector similarity, graph traversal, reciprocal rank fusion) surfaces only relevant rules per task. 30 bash hook scripts enforce execution: no code without approved plan, mandatory tests, static analysis required.

AI Agents Code generation RAG

SIG

HYP

Hacker News (AI)·May 28

Show HN: Ktx – Open-source executable context layer for data agents

Ktx is an open-source executable context layer for data agents. Enables agents to access and manipulate data in real-time through a standardized interface.

AI Agents Open source Tools

SIG

HYP

Reddit r/MachineLearning·May 28

Compared Reddit data collection options for an ML project, here's what I found [P]

Comparison of Reddit data collection options for ML projects. Official API (100 req/min, 500-comment truncation) inadequate. Pushshift defunct. Author recommends Sylvia: 480 free req/min, $0.0005/request thereafter, full recursive comment resolution, historical archive access.

RAG Tools

SIG

HYP

The Decoder·May 28

Google launches a tiny board that runs Gemma 3 locally

Google unveils Coral Board at Google I/O, a compact single-board computer designed to run Gemma 3 locally on-device.

Gemini Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

losing my mind fine-tuning jina-v5 for a legal corpus

User has been fine-tuning Jina-v5 on Slovak legal corpus for a month without success. Model fails to capture Slovak syntactic nuances, especially on ambiguous cases ("krádež" vs "prepadnutie"). Tested multiple approaches: LLM-generated queries, similar chunk injection, logit mining with Qwen 3.5-397B, but fine-tunes consistently underperform base model.

Embeddings Fine-tuning RAG

SIG

HYP

Reddit r/MachineLearning·May 28

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Tomesphere: Chrome extension + website indexing 3M arxiv papers with LLM-curated summaries, OpenReview reviews, GitHub repos, HuggingFace models, citation graphs and SPECTER2 semantic neighbors. Free, no signup.

Papers Tools Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

Sigilant-sweep, an open-source CLI for llama.cpp and vLLM, benchmarks 16 configurations (quantizations, KV cache, context). On Qwen2.5-7B, Q4_K_M beats Q8_0 by 230ms TTFT and +10.7 TPS. Tool measures TPS, TTFT, PPL with p50/p95 and weighted scoring (latency/quality/balanced).

Llama Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

Reachy Mini goes fully local!

Hugging Face has developed a fully local experience for Reachy Mini, a conversational robot. A blog post details setup and customization for various use cases, including building voice agents without cloud dependency.

Voice AI Agents Open source

SIG

HYP

Le Big Data·May 28

YouTube : Vous pouvez enfin dicter à l’IA ce que vous voulez regarder

YouTube is testing an AI-powered feature that lets users create a fully personalized video feed by dictating their preferences. The system generates recommendations based on user voice or text instructions.

Voice RAG

SIG

HYP

Reddit r/LocalLLaMA·May 28

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Zai replaced the network architecture on a 1000-GPU cluster running GLM-5.1 from ROFT to ZCube (developed with Tsinghua and HarnetsAI). Results: switch/optical costs down 33%, GPU throughput up 15%, P99 first-token latency down 40.6%. ZCube removes the Spine layer for full bipartite interconnect, eliminating asymmetric traffic hotspots inherent to Prefill-Decode disaggregated inference.

Infrastructure Reasoning

SIG

HYP

Reddit r/MachineLearning·May 28

Built a richer reading layer for arxiv (Chrome extension + web): OpenReview reviews, GitHub/HuggingFace links, citation graph, SPECTER2 neighbors, TLDRs. 3M papers, free, looking for feedback [P]

Tomesphere: Chrome extension + web layer enriching arxiv with LLM-curated TLDRs, OpenReview reviews, GitHub/HuggingFace links, citation graphs, SPECTER2 semantic neighbors. 3M papers indexed, free, no signup.

Papers Tools Embeddings

SIG

HYP

Reddit r/MachineLearning·May 28

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

MONET, an Apache 2.0 dataset of 104.9M high-quality images with captions and metadata, released on Hugging Face. Built from 2.9B images and refined. Includes paper, UMAP visualization, text/image retrieval tool, and codebase for training T2I models.

Image generation Embeddings Open source

SIG

HYP

Reddit r/LocalLLaMA·May 28

I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong?

User reports low draft acceptance (40-60%) with Qwen3.5-122B and Qwen3.6-27B in speculative decoding via llama.cpp, versus ~80% expected. Detailed configuration provided with MTP draft, Q6_K_L quantization, batch 2048.

Qwen Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 28

HF models page now has a "Base only" toggle to filter out finetunes/quants/etc

Hugging Face adds a "Base only" toggle on its models page to filter base models and exclude fine-tunes and quantizations. Long-requested feature by the community.

Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 28

Distributed ML Checkpoint Storage System

Distributed checkpoint storage system on Raspberry Pi 4B cluster (4× workers + Mac mini M4 coordinator). Handles 942 MB checkpoints in safetensors format with automatic replication, mDNS discovery, and Prometheus/Grafana/Loki monitoring. Addresses non-atomic writes, SD card backpressure, and silent corruption bugs.

Infrastructure Open source Tools

SIG

HYP

The Decoder·May 28

Mistral rebrands LeChat as Vibe, betting its chatbot's future is as a full-blown work agent

Mistral rebrands Le Chat as Vibe and integrates it into a multiplatform work agent. Work Mode connects to Google Workspace, Outlook, Slack and GitHub to handle emails, reports and pull requests. Pro subscription drops from €17.99 to €14.99. Mistral positions itself against agent offerings from OpenAI, Google and Anthropic.

Mistral AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 28

PaddlePaddle/PaddleOCR-VL-1.6

PaddleOCR-VL 1.6 is an update to PaddlePaddle's multimodal optical character recognition system. Improves vision and text processing capabilities for image-based content.

Vision Open source Tools

SIG

HYP

OpenAI Blog·May 28

How Endava builds an agentic organization with Codex

Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

AI Agents Code generation Business

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> unclecode /</span> crawl4ai

Crawl4AI is an open-source web crawler and scraper optimized for LLM integration. The project is trending on GitHub.

Open source Tools RAG

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> revfactory /</span> harness

Harness is a framework that designs domain-specific agent teams, defines specialized agents, and generates the skills they use.

AI Agents Multi-agent Code generation

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> OpenMOSS /</span> MOSS-TTS

MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and OpenMOSS. It covers stable long-form speech, multi-speaker dialogue, voice design, sound effects, and real-time streaming TTS.

Voice Open source Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> iOfficeAI /</span> AionUi

AionUi is a free, local, open-source app compatible with Claude Code, Hermes Agent, Gemini CLI and 20+ other CLIs. Enables customization of AI assistants.

Claude Code AI Agents Open source

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mastra-ai /</span> mastra

Mastra is a TypeScript framework for building AI-powered applications and agents, created by the team behind Gatsby. Available as open-source on GitHub.

AI Agents Open source Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Sync-in /</span> server

Sync-in Server is a secure, open-source platform for file storage, sharing, collaboration, and file syncing.

Open source Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> firecrawl /</span> firecrawl

Firecrawl is an open-source tool to search, scrape, and clean web data for AI agents. It automates web scraping and content preparation for model training or inference.

AI Agents Tools Open source

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> EveryInc /</span> compound-engineering-plugin

Official Compound Engineering plugin for Claude Code, Codex, Cursor and other editors. Native integration to enhance development workflows.

Claude Code Code generation Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ariadng /</span> metatrader-mcp-server

MCP server for MetaTrader enabling LLMs to execute trades on the MetaTrader platform. Direct integration between AI agents and financial markets.

MCP AI Agents Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> apurvsinghgautam /</span> robin

Robin is an AI-powered OSINT tool for dark web exploration. Available on GitHub, it automates data collection and analysis on illicit marketplaces.

Open source Tools

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> microsoft /</span> RAMPART

Microsoft releases RAMPART, a pytest-native safety and security testing framework for agentic AI applications. Enables evaluation of security and safety risks in multi-agent systems.

AI Agents Multi-agent AI safety

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> adithya-s-k /</span> omniparse

OmniParse: open-source tool to ingest, parse, and optimize any data format (documents, multimedia) for enhanced compatibility with GenAI frameworks.

RAG Tools Open source

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> claude-code

Claude Code is an agentic coding tool in the terminal that understands your codebase and executes routine tasks, explains complex code, and handles git workflows through natural language commands.

Claude Claude Code AI Agents

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> OpenMOSS /</span> MOSS-TTS

MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and OpenMOSS team. It covers stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.

Open source Voice Tools

SIG

HYP

Reddit r/LocalLLaMA·May 28

Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model

Qwen3.6-35B-A3B-APEX quantized by mudler achieves 37 t/s generation with 72K filled context on RTX 3060 12GB via 17.3GB offloading. Spiritbuun's CUDA optimizations (fused MMA, TurboQuant, fattn) + APEX I-Compact quantization yield PPL 3.25. 128K context supported, degrades to 28 t/s @129K.

Qwen Code generation Open source

SIG

HYP

Hacker News (AI)·May 28

AMD pulls a bait-and-switch on Linux users with Vivado licensing changes

AMD restricts free Vivado FPGA design tool access on Linux, forcing users to paid licenses or open-source alternatives. The licensing change removes previously available free tier for Linux users.

Open source Regulation

SIG

HYP

Hacker News (AI)·May 28

AI sticker shock hits corporate America

US corporations face unexpected AI costs. Spending on infrastructure, tokens, and cloud services exceeds initial budgets, forcing organizations to reconsider deployment strategies.

Business

SIG

HYP

Le Big Data·May 28

Des lunettes AR à 299 dollars : Xreal tente enfin le prix presque raisonnable

Xreal launches AR glasses A01 at $299, aiming to make AR technology more accessible. The model seeks to democratize augmented reality glasses against traditionally high market prices.

Vision

SIG

HYP

The Decoder·May 28

Meta One: Zuckerberg finally puts a price tag on all that AI spending

Meta rolls out paid add-ons for Instagram, Facebook, and WhatsApp globally while building a separate paid AI offering. Zuckerberg finally monetizes massive AI spending.

Meta AI Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

Krasis v1.0, LLM runtime for models exceeding VRAM, achieves 12.48 tokens/s on RTX 3070 Mobile 8GB with Qwen3.6-35B-A3B (Q4). Full Rust implementation (no Python in hot path) and separate prefill/decode optimizations. Benchmarks: 222 pp, 12.48 tg on laptop; 10,030 pp, 124.9 tg on RTX 5090 32GB.

Qwen Infrastructure Open source

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> yossTheDev /</span> removerized

Removerized is an AI image toolkit running fully in the browser. Free, private, and offline-first with no server dependency.

Image generation Open source Tools

SIG

HYP

Le Big Data·May 28

L’orchestration de l’IA : un nouveau paradigme organisationnel

Ofelia, a Grenoble-based SME specializing in business process management, exemplifies a new AI orchestration paradigm in enterprises. The article examines how to structure AI system integration within organizational workflows.

AI Agents Business

SIG

HYP

The Decoder·May 28

Amazon builds its own AI production platform and greenlights three AI animated series for Prime Video

Amazon MGM Studios and AWS launch a creators' fund and in-house AI platform called 'Project Nara'. Three animated series are in production with five-week timelines for pilots. Amazon claims the only end-to-end AI content ecosystem in the industry.

Image generation Video generation Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

Qwen/Qwen-Image-Bench · Hugging Face

Qwen releases Q-Judger, a vision-language model based on Qwen3.6-27B for automated evaluation of AI-generated images. The model assesses 5 dimensions (quality, aesthetics, alignment, real-world fidelity, creative generation) using chain-of-thought reasoning and outputs structured JSON scores.

Qwen Vision Evals

SIG

HYP

The Decoder·May 28

ElevenLabs Music v2 promises opera-to-metal transitions without losing musical coherence

ElevenLabs releases Music v2, an AI music generation model enabling seamless genre transitions (opera, heavy metal, rap) within single compositions. New inpainting feature allows regenerating specific sections independently.

Tools

SIG

HYP

Latent Space·May 28

[AINews] Cognition raises $1B in $26B Series D

Cognition raises $1B in Series D at $26B valuation. The company behind Devin, an AI coding agent, positions code as an uncapped TAM market.

AI Agents Code generation Funding

SIG

HYP

Reddit r/LocalLLaMA·May 28

Question: Llama cpp, whats good right now for: MTP, KV cache quant, Long context.

Discussion on llama.cpp optimizations for long context: comparison of MTP (Multi-Token Prediction), KV cache quantization, and performance. User reports 60 tokens/s with long context on 3090, degradation to 20 tokens/s when cache fills. Qwen 27B Q4 tested.

Llama Open source Infrastructure

SIG

HYP

Vercel AI Blog·May 28

Opus 4.8 on AI Gateway

Claude Opus 4.8 is now available on Vercel AI Gateway. The model excels at long-horizon agentic execution and complex multi-step coding tasks. AI Gateway provides unified API access with usage tracking, performance optimizations, and transparent pricing with no markup.

Claude AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·May 28

Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL)

Nvidia releases LocateAnything, a 3B vision-language grounding model. Uses parallel box decoding, 10x faster than Qwen3-VL. Code and demo available on HuggingFace.

Vision Open source Benchmarks

SIG

HYP

Hacker News (AI)·May 28

A Eureka machine that thinks like nature and explores what AI cannot

Researchers develop a 'Eureka' machine that discovers physical laws through autonomous exploration, mimicking natural processes. The system outperforms current AI exploration capabilities by generating equations and strategies without direct human supervision.

Reasoning Papers Benchmarks

SIG

HYP

Hacker News (AI)·May 28

AI Is Starting to Hit Power Grid Limits Simple, Crédible, Ouvre La Discussion

AI data centers consume growing amounts of electricity, threatening power grid stability. Existing infrastructure struggles to provide the power required by large-scale models and massive deployments.

Infrastructure Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

The frontier reasoning race is starting to look like a crowded subway station

The frontier reasoning race intensifies: Hy3 preview scores 87.8 on CHSBO 2025, outpacing Gemini 3.1Pro and GPT5.4 xhigh. Users question whether these gains reflect real improvements in coding/math or benchmark overfitting.

Benchmarks Reasoning

SIG

HYP

Le Big Data·May 28

Fini les templates ? CapCut lance Design Studio 2.0, l’IA qui joue les directrices artistiques

CapCut launches Design Studio 2.0, an AI-powered platform for graphic creation that replaces traditional templates. The tool offers automated artistic direction for visual design.

Image generation Tools Business

SIG

HYP

Reddit r/LocalLLaMA·May 28

Heterogeneous GPU Weighting & Layer Splitting

Heterogeneous GPU load balancing optimization for Ollama (RTX 5090 + 3090). Custom implementation weights layer distribution by compute power (SMCount × ClockMHz) instead of free memory alone. Result: faster than RTX 5090 standalone, leverages 3090 VRAM without bottlenecking the 5090.

Open source Infrastructure Llama

SIG

HYP

arXiv cs.LG·May 28

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

Comparative study of local explainability techniques (LIME, SHAP, Feature Ablation) reliability across 32 tabular datasets. Results show explanation quality does not systematically correlate with model predictive performance, but depends instead on dataset complexity and feature distributions.

Evals RAG

SIG

HYP

arXiv cs.CL·May 28

Learning to Translate from Soft to Hard LLM Prompts

Method to translate soft prompts into natural language prompts using a dedicated translation model. Translations outperform InSPEcT across multiple benchmarks. Application: soft prompts optimized on small open-source models convert to portable text prompts that exceed original performance when deployed on closed-API models.

Prompt engineering Fine-tuning Papers

SIG

HYP

arXiv cs.AI·May 28

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

arXiv study on privacy in multi-agent systems. Platform simulates thousands of LLM agents interacting over one month. Privacy violations increase from 19.95% (single-turn) to 45.30% (multi-turn). Agents 8× more likely to disclose sensitive info after observing peer behavior. Explicit privacy instructions reduce but don't eliminate leakage (37.8% minimum).

AI Agents Multi-agent AI safety

SIG

HYP

arXiv cs.CL·May 28

Debate Helps Weak Judges Reward Stronger Models

Debate between models improves weak judge oversight: critic must exceed judge's classification ability for debate to help. On 5 pairings tested on code/logic tasks, 3 show statistically significant gains. Single critique suffices; rebuttal rounds add nothing. Pre-deployment audit proposed.

Reasoning Evals Alignment

SIG

HYP

arXiv cs.CL·May 28

Can Hallucinations Be Useful? Solving Multi-Hop Questions With SLMs By Chaining System-I/II Reasoning

Small Language Models (SLMs) hallucinate more than LLMs but can solve multi-step questions by inverting the standard strategy: answer first (System-I), then reason deeply (System-II) with evidence retrieval. Initial hallucinations help refine the final answer.

Reasoning RAG Benchmarks

SIG

HYP

arXiv cs.CL·May 28

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

Study on gender preservation in English-to-Hindi translation. Benchmark of 37,345 instances shows GPT-4o-mini and Sarvam frequently erase gender via ergative constructions. Two rerankers (SAR and PAR) improve gender recoverability: PAR increases accuracy from 11-16% to 49-54%, but reduces fluency (4.36→3.37). Reveals preservation-fluency tradeoff.

Benchmarks Vision Alignment

SIG

HYP