RSS

Reddit r/LocalLLaMA

GLM-5.2 shows excellent coherence over extremely long context and adaptive reasoning without excessive verbosity. User reports performance close to GPT-4.5 on heavy analysis and deep research, with faster inference than GLM-5.1. The model has its own distinct conversational signature.

Qwen Reasoning Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 18

CEOs of Anthropic and Google DeepMind call for U.S.-led AI coalition in meeting at G7

Dario Amodei (Anthropic) and Demis Hassabis (Google DeepMind) called for a U.S.-led AI coalition at a G7 meeting. Both executives advocated for international coordination amid geopolitical AI challenges.

Anthropic DeepMind Regulation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

llama.cpp now supports model management (downloading etc) via API

llama.cpp merges PR #23976 adding model management via API. On-demand downloading, loading, and unloading from directory. UI coming soon. Full lifecycle deployment and management through API alone.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

Inflect-Nano-v1, a 4.63M parameter TTS model, is the 2nd smallest publicly released speech synthesis model. Comprises acoustic model (3.46M) and vocoder (1.17M), generates 24 kHz English audio. ~17x smaller than Kokoro, ~108x smaller than Chatterbox. Runs locally via PyTorch, suited for embedded devices and offline voice assistants.

Voice Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Lin Junyang AI Lab Closes Round at $2B Valuation

Lin Junyang's AI lab closes funding round at $2B valuation. Lin Junyang, lead behind the Qwen line, launches new venture. Open source community expects significant contributions.

Qwen Open source Funding

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GLM 5.2 Release Video [Made with GLM 5.2]

GLM 5.2 generates videos via Remotion, comparable to Fable but below Gemini 3.1 Pro. Server overload observed on OpenRouter with timeouts on long outputs.

Video generation Gemini Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

US refrains from blacklisting DeepSeek but designates over 100 Chinese firms as security risks. Policy decision amid US-China tech and trade tensions.

DeepSeek Regulation Business

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

PSA: unsloth/GLM-5.2-GGUF is uploading

Unsloth created a HuggingFace repository for GLM-5.2 GGUF 30 minutes ago. Only the README is currently available; GGUF files are suspected to be uploading.

Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

i post-trained a model to reliably roll a die

A user post-trained a model to reliably simulate a die roll (each face ~1/6), exposing that frontier LLMs (Claude, GPT, Kimi) consistently answer '4'. Uses this toy problem to explore exploration vs. exploitation in RL and model behavior.

Reinforcement learning Claude GPT

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

llama.cpp - how to free up even more space on your GPU

llama.cpp optimizes GPU memory management. Key parameters: --no-mmproj-offload frees 1GB for vision models, --cache-type-k/v reduces KV cache by 50-75%, --spec-draft-n-max=2 optimizes speculative decoding. Flash attention enabled by default. Tested on Qwen 3.6-27B with 150k context on RTX 3090.

Llama Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

We built an open source UI kit for document RAG/agents

Extend releases an open source UI kit (MIT) for document RAG and agents: 15 components for PDF, DOCX, XLSX viewers with bounding box citations, file upload, e-signature. Built internally, tested on millions of pages/day, actively maintained.

RAG AI Agents Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

Docker deployment config for GLM-5.2-FP8 on HGX-H200 using SGLang. Achieves 70 tokens/s and 262k context by disabling DP and moe-a2a-backend deepep, with mem-fraction-static set to 0.83. Official vLLM recipes incompatible with H200.

Qwen Code generation Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

LoopCoder-V2 is a 7B code model based on Parallel Loop Transformer (PLT) that improves test-time performance through two passes of shared Transformer blocks. Trained on 18T tokens of mixed text/code data, it reaches 64.4 on SWE-bench Verified (vs 43.0 baseline), with two loops as the optimal gain-cost setting.

Code generation Reasoning Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B runs in-browser at 255 tokens/sec using WebGPU kernels optimized by Fable 5. Demo and kernels released on Hugging Face.

Gemini Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Bench evaluates whether AI agents can build playable games end-to-end in a real game engine. Benchmark tests Opus-4.7, GPT-5.5, Kimi-K2.6, DeepSeek-V4-Pro and others. No results reported for medium-sized models (27B-31B).

AI Agents Benchmarks Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

TRELLIS.2 now runs natively on MLX (Image to 3d object model)

Native MLX port of Microsoft's TRELLIS.2 for Apple Silicon. Image-to-3D object generation at 512×512 (~70s) and 1024×1024 (~300-700s) on M4 Max. GitHub repo released.

Open source Tools Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Making budget models punch above their weight with a smart Rust harness

A Rust developer optimizes small language models through efficient system architecture. A Rust harness improves inference performance without modifying model weights, enabling budget models to compete with larger versions.

Open source Infrastructure Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GLM-5.2 is a win for local AI

GLM-5.2 (744B) under MIT license marks progress for local AI despite its massive footprint. The community can distill its reasoning capabilities into 8B/70B models, significantly improving local setups.

Open source Fine-tuning Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C

A local Qwen 27B agent completed a raytraced FPS demo in pure C using headless screenshot loops for visual debugging. Adding headless mode with keyboard/mouse injection and frame capture transformed the approach: the model learned to automate recursive visual debugging loops independently.

Qwen AI Agents Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

Developer releases local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects. LLM handles dialogue, narration, and quest progression; game system manages inventory, combat, and saves. Generated elements are stored and reusable.

Open source Tools AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

SIQ-1 Qwen3.6: PPO fine-tuning of Qwen-35B-A3 outperforming GLM-5.2 and Qwen-350B on autoresearch (karpathy benchmark) and bullshit-bench. Model + GGUF available on HuggingFace with demo agent.

Qwen Reinforcement learning AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Local models went from mostly useless to actually useful really fast. What changed?

Local models shifted from marginal tools to viable solutions in one year. Gemma, Qwen, GLM, Kimi now replace some API calls for coding, private documents, and local workflows, though gaps remain on complex tasks requiring planning and error correction.

Llama Open source Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

A Year Building a Fully Local Home Voice Assistant · Fulloch

Developer shares 12-month journey building a fully local home voice assistant using open-source models as Alexa alternative. Documents what worked and what didn't throughout the project.

Open source Voice AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Rio 3.5 397B, funded at ~$100K USD, turns out to be a simple merge of models (Nex N2 Pro) without additional training, contrary to initial claims of Qwen 3.5 397B improvements. After discovery, the team changed documentation and claims the trained model was lost, raising embezzlement suspicions.

Open source Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories

Analysis of low narrative diversity in LLM-generated stories. The author examines why models produce repetitive tales with similar characters and structures despite varied prompts.

Llama Prompt engineering Evals

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Benchmarks from the latest eBay special: W6800 (modded V620)

Benchmarks of modded AMD Radeon Pro W6800 (V620 with W6800 firmware) tested with Qwen 3.6 27B Q6_K on llama.cpp. Vulkan performance: 297.94 t/s (pp1024), 20.35 t/s (tg256). Firmware enables mini-displayport but disables some compute cores.

Benchmarks Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

VibeThinker-3B: what is this witchcraft? Killing it at MathQA like it has ~30B parameters

VibeThinker-3B, a 3B model, achieves exceptional MathQA results comparable to ~30B models. Reddit users report abnormally high performance for its size.

Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

I didn't know it was possible to compile llamacpp to run cuda + vulkan at the same time..

User compiles llama.cpp with CUDA and Vulkan enabled simultaneously on W7800. Achieves +10% tokens/sec improvement in decoding with MiniMax-M3-UD-IQ2_M. Tests dual GPU accelerator combination for performance optimization.

Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5.

GLM-5.2 reaches 1st place on Design Arena benchmark, surpassing the now-unavailable Claude Fable 5. Zhipu AI's model leads the design evaluation leaderboard.

Benchmarks Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Minimax M3 (4 bit MLX) Initial Benchmark on Mac Studio M3u 512gb

Minimax M3 4-bit MLX benchmark on Mac Studio M3 512GB. Results: TTFT 3.1s (pp1024/tg128), throughput 147.7 tok/s, peak memory 226.6GB. Continuous batching: 1.83x speedup at 4 parallel requests (49.9 tok/s).

Benchmarks Open source Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 released with open weights under MIT license. 1M context window, two reasoning effort modes, strong coding arena performance. Open-source model unlike API-only alternatives.

Qwen Open source Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM 5.2 API is live, weights are on HF, and ollama has it already

GLM-5.2 API live at $1.4/M input tokens, $4.4/M output. Weights released MIT-licensed on HuggingFace, Ollama support available. Benchmarks: 81.0 Terminal-Bench 2.1, 62.1 SWE-bench Pro, 74.4 FrontierSWE. 1M context window, two thinking modes (High/Max).

Open source Code generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Get in here: Community model build thread

A Reddit thread proposes building a community model through distributed compute using a Mixture-of-Experts (MoE) approach. The 'Branch-Train-Stitch' strategy distributes a dense prototype model to participants who train it independently on their hardware, then merge the submodels into an MoE. Key decisions include prototype size (2B or 7B) based on available VRAM.

Open source Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

GLM-5.2 becomes the first open-weights model to exceed 80% on Terminal-Bench, outperforming all other open models and Gemini. Frontier-level performance at reduced cost.

Qwen Benchmarks Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 Takes #2 Spot on WebDew Arena

GLM-5.2 reaches #2 position on WebDev Arena leaderboard. The Qwen model ranks highly against major competitors.

Qwen Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 is available on HuggingChat

GLM-5.2, Zhipu AI's model, is now available on HuggingChat. No technical details provided in the announcement.

Qwen

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

A benchmark for tiny LLMs based on a real world problem: natural language file search (using monkeSearch)

Benchmark for small LLMs (<3B parameters) evaluating natural language parsing into structured JSON for file search. 9 models tested (Gemma-3 270M to DeepSeek R1 Distill 1.5B) on 80 queries covering file types, temporal context, and specificity. Results: 0.8B–1.5B models significantly outperform sub-0.5B.

Benchmarks Open source Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Mistral - New family of open-weight models @ July

Mistral announces a new family of open-weight models in July. Tweet from CEO Arthur Mensch confirms the release with no additional technical details in the excerpt.

Mistral Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Glimmer 1 - Glint Research. A foundational 10,000 parameter language model

Glint Research introduces Glimmer 1, a foundational 10k parameter language model trained on 500K tokens of FineWeb-Edu. Standard Llama architecture with 16 hidden dims, 2 layers, 4 attention heads, 512 token context window. Benchmarks: arc_easy 25.46%, wikitext-2 byte perplexity 14.73.

Llama Open source Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

zai-org/GLM-5.2 is here!

GLM-5.2 is now available. The zai-org model improves reasoning and comprehension capabilities compared to previous versions.

Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

GGUF version of Command-A-Plus-05-2026 model released on Hugging Face. Author invites users to test with latest llama.cpp and share token/second benchmarks and feedback.

Open source Tools Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

[Article] The Case For Open-Weight Models And Why We Can't Trust Frontier Labs | provos.org

Article arguing for open-weight models against frontier labs. Criticizes power concentration among few companies and advocates for accessibility and transparency of AI model weights.

Open source Llama Alignment

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Anthropic going back on `claude -p` 3rd party usage

Anthropic reverses its ban on third-party wrappers for claude-p access. Community suspects a PR move rather than lasting policy shift, distinct from previous OpenClaw and Hermes bans.

Claude Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

VibeThinker-3B achieves 94.3 on AIME'26, 80.2 on LiveCodeBench v6, and 96.1% pass rate on unseen LeetCode contests. The model demonstrates small models can reach frontier-level reasoning performance in math and coding through clear verification signals.

Reasoning Benchmarks Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Qwen Robot Suite

Alibaba announces Qwen Robot Suite, a robotics software suite based on Qwen models. Technical details and capabilities not specified in excerpt.

Qwen Robotics

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Why might DiffusionGemma be better at tool calls than its benchmark quality suggests

DiffusionGemma generates 256 tokens in parallel with bidirectional attention, enabling self-correction before finalization. Unlike autoregressive models locked after each token, this architecture could improve structured tool calls despite lower base quality than Gemma 4. Testing needed to confirm if bidirectional correction compensates for lower quality.

Gemini Code generation Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Qwen3.6 27B quants

User benchmarks Qwen3.6 27B extreme quantization (IQ3 XXS turbo4) vs Q8 on code review task. IQ3 XXS (5min, 1230pp/50tg) generates comparable recommendations to Q8 (1h56m, 306pp/3tg). Finding: aggressive quantization adequate for coding tasks with good prompting.

Qwen Code generation Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Gemma 12b - Reasoning hardening instructions

A user shares a system instruction to improve reasoning in Gemma 12b QAT. The technique aims to reduce cognitive bias and adapt reasoning depth to context. It works well on trick questions but partially fails on certain problems depending on framing.

Gemini Prompt engineering Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Be wary of Qwen/Claude distillations - they're often worse than the base model

Qwen/Claude distillations circulating on r/LocalLLaMA (Qwopus, Fable 5 on Qwen 3.6) use 4k-10k training samples, insufficient to improve performance. Compared to 700k samples in official DeepSeek-R1 distillations, these models don't exceed base Qwen and slightly degrade quality despite different reasoning style.

Qwen Claude Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models

Trace Commons initiative: collecting coding session traces under CC-BY-4.0 license to train open-source and open-weight models. Goal: counterbalance Anthropic and OpenAI's competitive advantage from proprietary data accumulated via Claude Code and Codex.

Open source Code generation AI Agents

SIG

HYP

Reddit r/LocalLLaMA — AI feed · Signal IA

Reddit r/LocalLLaMA

Quick thoughts on GLM-5.2 (Bonus: Censorship question answers)

CEOs of Anthropic and Google DeepMind call for U.S.-led AI coalition in meeting at G7

llama.cpp now supports model management (downloading etc) via API

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

Lin Junyang AI Lab Closes Round at $2B Valuation

GLM 5.2 Release Video [Made with GLM 5.2]

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say

PSA: unsloth/GLM-5.2-GGUF is uploading

i post-trained a model to reliably roll a die

llama.cpp - how to free up even more space on your GPU

We built an open source UI kit for document RAG/agents

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

TRELLIS.2 now runs natively on MLX (Image to 3d object model)

Making budget models punch above their weight with a smart Rust harness

GLM-5.2 is a win for local AI

Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C

I released a local LLM-powered RPG where generated NPCs, locations, items, and quests persist as in-game objects

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

Local models went from mostly useless to actually useful really fast. What changed?

A Year Building a Fully Local Home Voice Assistant · Fulloch

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories

Benchmarks from the latest eBay special: W6800 (modded V620)

VibeThinker-3B: what is this witchcraft? Killing it at MathQA like it has ~30B parameters

I didn't know it was possible to compile llamacpp to run cuda + vulkan at the same time..

GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5.

Minimax M3 (4 bit MLX) Initial Benchmark on Mac Studio M3u 512gb

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM 5.2 API is live, weights are on HF, and ollama has it already

Get in here: Community model build thread

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

GLM-5.2 Takes #2 Spot on WebDew Arena

GLM-5.2 is available on HuggingChat

A benchmark for tiny LLMs based on a real world problem: natural language file search (using monkeSearch)

Mistral - New family of open-weight models @ July

Glimmer 1 - Glint Research. A foundational 10,000 parameter language model

zai-org/GLM-5.2 is here!

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

[Article] The Case For Open-Weight Models And Why We Can't Trust Frontier Labs | provos.org

Anthropic going back on `claude -p` 3rd party usage

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

Qwen Robot Suite

Why might DiffusionGemma be better at tool calls than its benchmark quality suggests

Qwen3.6 27B quants

Gemma 12b - Reasoning hardening instructions

Be wary of Qwen/Claude distillations - they're often worse than the base model

Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models