AI news, scored.

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Xiaomi announces MiMo-V2.5-Pro UltraSpeed achieving 1,000+ tokens/sec on a 1 trillion parameter MoE model using standard 8-GPU server, without custom hardware.

Open source Benchmarks Infrastructure

SIG

35

HYP

72

Reddit r/LocalLLaMA·Jun 8

Nex N2 has a funny "few words do trick" reasoning

Nex N2 Pro (Qwen 3.5 397B finetune) exhibits a distinctive reasoning pattern using repeated simple words ("need", "maybe") to reduce token usage. The user notes this approach makes parsing reasoning harder despite lower linguistic complexity.

Qwen Reasoning Open source

SIG

35

HYP

25

Hacker News (AI)·Jun 8

Terry Tao Became an Evangelist for AI in Math

Terence Tao, world-renowned mathematician, advocates for AI adoption in mathematical research. He explores how AI tools can enhance discovery and proof capabilities in mathematics.

Reasoning Papers

SIG

35

HYP

25

Reddit r/LocalLLaMA·Jun 8

Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas?

User with 2x RTX 3060 Ti tests Gemma 4 QAT with MTP assistant model on llama.cpp. Achieves 100 t/s (33% speedup) with 80%+ draft acceptance rate, seeks tuning to exceed this threshold.

Llama Code generation Open source

SIG

35

HYP

15

Reddit r/LocalLLaMA·Jun 8

Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong?

A lawyer seeks to build a local RAG system to analyze case files (correspondence, contracts, court decisions) with citations. After testing Qwen 3.5 9B and gpt-oss-20b via LM Studio + Big RAG, he encounters two issues: insufficient speed (~2.2 tok/s) and model refusal to cite his own documents, generating generic explanations instead of context-grounded analysis.

RAG Qwen Open source

SIG

35

HYP

15

Le Big Data·Jun 8

Il abandonne ses abonnements IA pour un Mac Mini et économise 2 500 $ par an

A developer drops monthly AI subscriptions ($210/month) for a Mac Mini, saving $2,500 annually. Cost-benefit analysis between cloud services and local infrastructure.

Business Tools Infrastructure

SIG

35

HYP

65

Le Big Data·Jun 8

Faux drames et pièges à clics : le fil d’actu de Meta AI part complètement en vrille

Meta AI's news feed generates problematic content: false dramas, clickbait, and misleading posts. Users are confused about the assistant's actual nature and reliability.

Meta AI

SIG

35

HYP

65

Reddit r/MachineLearning·Jun 8

LLM Relational Intelligence: A 4-Month Research Experiment on Multi-Model Behavioral Alignment with Human Communication [R]

4-month experiment testing whether context windows can be engineered so frontier models (GPT, Claude, Gemini, Grok) interact indistinguishably from human-to-human interaction. Gemini demonstrates highest relational intelligence. Author treats context window as behavioral environment rather than query interface, using modeling, accountability, humor, and social correction.

Prompt engineering GPT Claude

SIG

35

HYP

65

Le Big Data·Jun 8

Anthropic veut geler la course à l’IA, vraie peur ou stratégie ?

Anthropic calls for a global pause in the AI race, warning of risks from self-improving AI. The demand is striking but raises questions about its strategic intent.

Anthropic AI safety Alignment

SIG

35

HYP

72

Hacker News (AI)·Jun 8

Blaise v0.10.0: Native Back End, Threads and Incremental Compilation

Blaise v0.10.0 introduces native backend, thread support, and incremental compilation. Technical update to a programming language with performance and concurrency improvements.

Open source Infrastructure

SIG

35

HYP

15

Le Big Data·Jun 8

Comment booster l’engagement grâce à l’agent IA d’assistance client du Marketing Hub de HubSpot ?

HubSpot integrates an AI customer support agent into its Marketing Hub to improve engagement. The tool aims to deliver fast and accurate responses to customers on the Web.

AI Agents Business

SIG

35

HYP

55

GitHub Trending·Jun 8

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> phuryn /</span> pm-skills

PM Skills Marketplace offers 100+ agentic skills, commands, and plugins spanning discovery, strategy, execution, launch, and growth phases.

AI Agents Tools Open source

SIG

35

HYP

55

GitHub Trending·Jun 8

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> idootop /</span> open-xiaoai

Open-source project enabling advanced voice listening capabilities for Xiaoai Speaker. Unlocks unlimited voice features on Xiaomi's smart speaker.

Open source Voice

SIG

35

HYP

45

Le Big Data·Jun 8

ZoomMate connecte les conversations aux workflows

Zoom launches ZoomMate and AI Productivity Suite to integrate conversations with workflows. The company continues expanding into collaborative tools.

Tools Business

SIG

35

HYP

45

Hacker News (AI)·Jun 8

Painting the Internet: A Different Kind of Warhol Worm [pdf]

Academic paper on a new class of worms capable of visually modifying web content in real-time, inspired by digital art techniques. Theoretical security approach exploring client-side rendering vulnerabilities.

AI safety

SIG

35

HYP

15

OpenAI Blog·Jun 8

Built to benefit everyone: our plan

OpenAI outlines its vision for AI's future, focusing on access, safety, and shared prosperity. The company commits to ensuring AGI benefits everyone.

OpenAI Alignment AI safety

SIG

35

HYP

65

Reddit r/LocalLLaMA·Jun 7

Hear Me Out, Pi Fans Lurking Here

A r/LocalLLaMA user criticizes Pi, Mario Zechner's agentic framework, for not being optimized for local LLMs. Pi uses a short system prompt and minimal tools, designed for API users (Claude). The author tests Pi on Nemotron and Qwen: local models fail to execute reliable tool calls without enabling reasoning, revealing a fundamental mismatch.

AI Agents Open source Tools

SIG

35

HYP

45

Hacker News (AI)·Jun 7

Show HN: Nightwatch, The open-source, read-only AI SRE

Nightwatch is an open-source AI-powered SRE tool operating in read-only mode. Presented on Hacker News with modest engagement (4 points, 2 comments), it offers automation without direct system modifications.

AI Agents Open source Tools

SIG

35

HYP

25

Reddit r/LocalLLaMA·Jun 7

QAT variant of Gemma4 26B A4B is not working well for me

User reports that QAT variant of Gemma-4 26B A4B (google/gemma-4-26B-A4B-it-qat-q4_0-gguf and unsloth/gemma-4-26B-A4B-it-qat-GGUF:Q4_K_XL) produces degraded results on a chessboard SVG test with llama.cpp b9549, compared to the older non-QAT version which performs correctly.

Gemini Open source Tools

SIG

35

HYP

15

Reddit r/LocalLLaMA·Jun 7

GMKtec Crams OCuLink, Wi-Fi 7 and Dual PCIe 4.0 Into the EVO-X3, With a 192GB Ryzen AI MAX+ 495 Monster Following Later This Year

GMKtec announces EVO-X3 with OCuLink, Wi-Fi 7, and dual PCIe 4.0. A variant with Ryzen AI MAX+ 495 and 192GB RAM planned for late 2024. First known hardware announcement for this processor.

Infrastructure

SIG

35

HYP

45

Hacker News (AI)·Jun 7

The ROI of AI coding looks different when you are a bootstrapped founder

A bootstrapped founder examines the ROI of AI coding tools. The calculation differs for unfunded startups: API costs, actual productivity gains, and development velocity impact follow different economics than venture-backed companies.

Code generation Business

SIG

35

HYP

25

Reddit r/LocalLLaMA·Jun 7

A handy llama-server launcher with easy model and configuration customisation

Open-source utility to launch llama-server with centralized configuration and model management. Supports multiple llama-server binaries, per-model overrides, and command-line overrides. Available on GitHub.

Llama Tools Open source

SIG

35

HYP

15

Hacker News (AI)·Jun 7

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

Analysis of inference costs: Anthropic and OpenAI may spend 10x more per user request than revenue generated. Operating margins appear negative at scale, raising questions about the economic viability of current models.

Anthropic OpenAI Business

SIG

35

HYP

65

GitHub Trending·Jun 7

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> AstrBotDevs /</span> AstrBot

AstrBot is an AI agent framework integrating multiple IM platforms, LLMs, and plugins. Open-source alternative to OpenClaw for building AI assistants.

AI Agents Open source Tools

SIG

35

HYP

45

GitHub Trending·Jun 7

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ashishpatel26 /</span> 500-AI-Agents-Projects

Curated collection of 500 AI agent projects across healthcare, finance, education, retail and more. Practical use cases with links to open-source implementations.

AI Agents Open source

SIG

35

HYP

45

Reddit r/LocalLLaMA·Jun 7

DeskDash - a free Windows tool to easily manage your GGUF files

DeskDash is a free Windows tool to easily manage GGUF files. Community-developed, it simplifies organizing and using locally quantized models.

Open source Tools Infrastructure

SIG

35

HYP

25

Hugging Face Blog·Jun 7

Sponsors especially OPENAI CODEX voucher usage for codex - openAI challange

OpenAI offers Codex vouchers to Hugging Face sponsors to test the code generation model. Partnership initiative between OpenAI and the community platform.

OpenAI Code generation Business

SIG

35

HYP

45

Hacker News (AI)·Jun 7

Show HN: Lathe – Use LLMs to learn a new domain, not skip past it

Lathe is a tool that leverages LLMs to deepen learning in a new domain rather than bypass it. Shared on Hacker News, the project offers a pedagogical approach where language models facilitate progressive understanding.

Tools Prompt engineering

SIG

35

HYP

25

Hacker News (AI)·Jun 7

Efficient and Training-Free Single-Image Diffusion Models

New approach for single-image diffusion models that generates images without additional training. The method is computationally efficient and memory-optimized.

Image generation Papers

SIG

35

HYP

15

Reddit r/MachineLearning·Jun 7

Research collection of Arxiv whitepapers [R]

Researcher shares collection of 1700 arXiv papers organized into 90 categories since ChatGPT launch. Migrated from Obsidian to web with 6000 'Inquiring Lines' (cross-cutting syntheses) and wiki links between papers. Includes prompts to discover related recent research.

Papers RAG

SIG

35

HYP

25

Reddit r/LocalLLaMA·Jun 7

How to compare Original vs QAT Gemma 4 31B Q4 quants

Discussion on methodology for comparing Gemma 4 31B original vs QAT-retrained Q4 quantizations. Author proposes benchmarking unquantized versions first (SuperGPQA, HLE, MMLU) then measuring divergence of each Q4 against its own reference, rather than direct cross-variant comparison.

Gemini Benchmarks Evals

SIG

35

HYP

15

Reddit r/LocalLLaMA·Jun 7

You don't need a GPU to run gemma-4-26B-A4B

User runs Gemma-4-26B-A4B on old i5-8500 CPU with 32GB RAM, no GPU, achieving ~7 T/s via Koboldcpp. Recent compressed models make GPUs less essential for local inference.

Gemini Open source

SIG

35

HYP

55

Reddit r/MachineLearning·Jun 6

Looking for critical review of an NN architecture (possible evaluation bias?) [D]

Amateur student seeks critical review of a custom neural network architecture (Directional Neural Network) he designed. The architecture outperforms standard MLPs on simple tasks, but the author suspects potential evaluation bias in his comparisons (initialization, optimizer, datasets). Shares a repository with reproducible code.

Papers Evals

SIG

35

HYP

15

Hacker News (AI)·Jun 6

Universal Memory Protocol – a shared format for agent memory

Universal Memory Protocol proposed to standardize memory storage and access format across AI agents. Aims to enable interoperability and reusability in multi-agent systems.

AI Agents Multi-agent Infrastructure

SIG

35

HYP

25

Hacker News (AI)·Jun 6

Computex 2026: Are We Heading for the Agentic PC Era Yet? – EE Times

Computex 2026 explores the emergence of agentic PCs. The industry debates whether personal computers finally integrate autonomous AI agents capable of executing tasks without constant human intervention.

AI Agents Business

SIG

35

HYP

55

Reddit r/LocalLLaMA·Jun 6

Fuck, sucessfully ran minecraft server on GLM AI's Agent lol.

A user asked GLM AI (Alibaba's agent) to host a playable Minecraft server. The agent generated the server, created a dashboard, and hosted it in Hong Kong. Demonstrates complex task execution capabilities.

AI Agents Code generation

SIG

35

HYP

65

Reddit r/LocalLLaMA·Jun 6

Gemma 4 QAT Unquantized Heretic is here

User releases an unofficial 4-bit quantized version of Gemma 4 26B MoE. Model intentionally diverges from original Gemma 4 in refusal and divergence mechanisms.

Gemini Open source

SIG

35

HYP

45

Reddit r/LocalLLaMA·Jun 6

Gemma 4 QAT accuracy inconsistencies

Analysis of accuracy inconsistencies in Gemma 4 quantization-aware training (QAT). The 12B model shows larger deviations from FP16 compared to MoE variants (E2B/E4B), contradicting theoretical expectations. Requests clarification on methodology and comparisons with non-QAT variants.

Gemini Benchmarks

SIG

35

HYP

15

GitHub Trending·Jun 6

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> danielmiessler /</span> Personal_AI_Infrastructure

GitHub repository offering agentic AI infrastructure designed to magnify human capabilities. Focuses on integrating AI agents into personal workflows.

AI Agents Infrastructure

SIG

35

HYP

45

GitHub Trending·Jun 6

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> supabase /</span> supabase

Supabase is a Postgres development platform providing a dedicated database for building web, mobile, and AI applications.

Infrastructure Tools

SIG

35

HYP

25

Page 178 of 192

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Nex N2 has a funny "few words do trick" reasoning

Terry Tao Became an Evangelist for AI in Math

Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas?

Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong?

Il abandonne ses abonnements IA pour un Mac Mini et économise 2 500 $ par an

Faux drames et pièges à clics : le fil d’actu de Meta AI part complètement en vrille

LLM Relational Intelligence: A 4-Month Research Experiment on Multi-Model Behavioral Alignment with Human Communication [R]

Anthropic veut geler la course à l’IA, vraie peur ou stratégie ?

Blaise v0.10.0: Native Back End, Threads and Incremental Compilation

Comment booster l’engagement grâce à l’agent IA d’assistance client du Marketing Hub de HubSpot ?

ZoomMate connecte les conversations aux workflows

Painting the Internet: A Different Kind of Warhol Worm [pdf]

Built to benefit everyone: our plan

Hear Me Out, Pi Fans Lurking Here

Show HN: Nightwatch, The open-source, read-only AI SRE

QAT variant of Gemma4 26B A4B is not working well for me

GMKtec Crams OCuLink, Wi-Fi 7 and Dual PCIe 4.0 Into the EVO-X3, With a 192GB Ryzen AI MAX+ 495 Monster Following Later This Year

The ROI of AI coding looks different when you are a bootstrapped founder

A handy llama-server launcher with easy model and configuration customisation

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

DeskDash - a free Windows tool to easily manage your GGUF files

Sponsors especially OPENAI CODEX voucher usage for codex - openAI challange

Show HN: Lathe – Use LLMs to learn a new domain, not skip past it

Efficient and Training-Free Single-Image Diffusion Models

Research collection of Arxiv whitepapers [R]

How to compare Original vs QAT Gemma 4 31B Q4 quants

You don't need a GPU to run gemma-4-26B-A4B

Looking for critical review of an NN architecture (possible evaluation bias?) [D]

Universal Memory Protocol – a shared format for agent memory

Computex 2026: Are We Heading for the Agentic PC Era Yet? – EE Times

Fuck, sucessfully ran minecraft server on GLM AI's Agent lol.

Gemma 4 QAT Unquantized Heretic is here

Gemma 4 QAT accuracy inconsistencies