Simon Willison·17 June 2026

GLM-5.2 is probably the most powerful text-only open weights LLM

Signal

Hype

In three linesZ.ai released GLM-5.2 (753B parameters, 40 active via MoE) under MIT license on June 16th. Text-only model with 1M token context window. Ranks 1st on Artificial Analysis Intelligence Index v4.1 (score 51) ahead of DeepSeek V4 Pro and Kimi K2.6. 2nd on Code Arena WebDev behind Claude Fable 5.

## GLM-5.2: First Open Weights Model to Surpass Proprietary Benchmarks on Text Tasks

### 1. What Just Changed

Z.ai released GLM-5.2 on June 16 under an MIT license — full weights, unrestricted commercial use. It's a 753B-parameter MoE with 40B active parameters, 1.51TB on disk. Context window jumps from 200K (GLM-5.1) to 1M tokens. On the Artificial Analysis Intelligence Index v4.1, it scores 51, versus 44 for MiniMax-M3, 44 for DeepSeek V4 Pro (max), and 43 for Kimi K2.6. This is the first time an open weights model has topped this index ahead of the comparable closed-model field.

On Code Arena WebDev — a leaderboard measuring front-end development tasks including agentic coding workflows — GLM-5.2 ranks 2nd, behind only Claude Fable 5. This is counterintuitive: the model is text-only, no vision, while the prevailing assumption was that strong front-end coding required image understanding to interpret mockups and screenshots.

### 2. The Numbers That Matter

**Inference cost**: OpenRouter offers it through 9 providers at $1.40/M input tokens and $4.40/M output tokens. For reference: GPT-5.5 runs $5/$30, Claude Opus 4.5-4.8 at $5/$25. The performance-to-cost ratio is structurally favorable — roughly 3.5× cheaper on input than comparable proprietary models.

**Token consumption**: GLM-5.2 generates an average of 43K output tokens per Intelligence Index task, up from 26K for GLM-5.1, versus 24K for MiniMax-M3, 35K for Kimi K2.6, and 37K for DeepSeek V4 Pro (max). This verbosity is the primary operational friction point: at $4.40/M output tokens, a complex task costs ~$0.19 in output alone. On high-volume agentic pipelines, the pricing advantage erodes quickly if the model systematically over-generates.

**1M token context**: The jump from 200K is meaningful for long-document RAG, full codebase analysis, or long transcript ingestion. No other text-only open weights model combines this context length with this benchmark score.

### 3. Who Loses Ground

**DeepSeek** has been the open weights reference since V3/R1, but GLM-5.2 beats it by 7 points on the Intelligence Index (51 vs. 44). DeepSeek V4 Pro was the effective ceiling for high-performance open weights — that ceiling just moved up.

**Kimi K2.6** (Moonshot AI) sits 8 points back (43 vs. 51), which is substantial on a normalized index.

**Proprietary API providers positioned in the mid-market** (GPT-4o-level pricing) see their value proposition weakened: GLM-5.2 self-hosted or via OpenRouter delivers superior performance at lower cost for pure-text workloads.

**Teams that had ruled out vision-free models for front-end work** need to revisit their decision matrix. The Code Arena WebDev ranking empirically invalidates the assumption that vision is required to excel at UI coding.

### 4. What to Watch

The 43K tokens/task verbosity is not trivial. It suggests either an extended chain-of-thought reasoning style baked into the model, or a systematic over-generation tendency that will degrade latency and cost in production. Teams deploying self-hosted will need to tune generation parameters (max_tokens, stop sequences) to prevent over-generation on simple tasks.

The MIT license is unambiguous for commercial use, which contrasts with the "open" licenses with usage restrictions from some competitors (Llama 4's monthly active user thresholds, for instance).

The vision family (GLM-5V-Turbo) remains closed. Z.ai maintains a clear segmentation: open weights on text, proprietary on vision. If GLM-5.2 gains adoption, pressure to open the vision branch will increase — or conversely, Z.ai will use vision as an API monetization lever.

Finally, 1.51TB of weights implies significant GPU infrastructure for self-hosting (minimum 8× H100 80GB in FP8, likely more in BF16). Real accessibility remains conditional on OpenRouter or cloud providers — which qualifies the "open" framing in contexts without dedicated infrastructure.

Read source

Your take?

Open source Benchmarks Code generation Reasoning

Summary generated by Claude — human-verified

GLM-5.2 is probably the most powerful text-only open weights LLM

Other angles on this story