OpenAI Blog·13 May 2024

Spring Update

Signal

Hype

In three linesOpenAI releases GPT-4o and expands free ChatGPT access with additional capabilities. The model improves multimodal performance and processing speed.

## GPT-4o: What the Announcement Actually Means

### 1. Immediate Context

OpenAI is releasing GPT-4o (the "o" stands for *omni*) while simultaneously pushing more capabilities down to the free ChatGPT tier. Before this announcement, the boundary was clear: GPT-4 was gated behind Plus subscriptions ($20/month), with free users limited to GPT-3.5. GPT-4o partially erases that line by becoming available on the free tier, subject to rate limits not yet fully disclosed.

The model is presented as natively multimodal — text, audio, and image processed through a unified pipeline rather than separate modules as was the case with GPT-4V + Whisper + TTS. OpenAI reports average audio latency of approximately 232 ms (versus 2.8 seconds for the previous GPT-4 + Whisper pipeline), placing voice response within the range of natural human conversation.

### 2. What Changes Technically

The "omni" architecture means the model ingests and generates text, audio, and images within the same neural network, without intermediate transcoding. In practice: the model can detect emotion in voice, adapt its tone in real time, read a facial expression via camera, and respond coherently to all three streams simultaneously.

On benchmarks published by OpenAI: - MMLU (text): GPT-4o scores 88.7%, vs. 86.4% for GPT-4 Turbo - HumanEval (code): 90.2% vs. 87.1% for GPT-4 Turbo - Token generation speed: approximately 2× faster than GPT-4 Turbo - API pricing: $5/million input tokens, $15/million output — a 50% reduction vs. GPT-4 Turbo rates ($10/M and $30/M)

These figures position GPT-4o as strictly superior to GPT-4 Turbo across all three classic axes: quality, speed, cost.

### 3. Winners and Losers

**Direct winners:** - API developers see inference costs cut in half overnight, with no code migration required if already integrated on the GPT-4 Turbo endpoint. - Free-tier users access a GPT-4-level model for the first time, making the Plus subscription value proposition harder to justify short-term (OpenAI compensates by reserving higher quotas and features like persistent memory for subscribers). - Voice/real-time use cases (embedded assistants, call centers, accessible interfaces) become economically viable where 2.8 s latency was previously prohibitive.

**Potential losers:** - **Anthropic**: Claude 3 Opus was until now the only credible GPT-4 competitor on reasoning benchmarks. GPT-4o surpasses it on MMLU and HumanEval while costing less (Opus: $15/M input, $75/M output). - **Google DeepMind**: Gemini 1.5 Pro had positioned its long context window (1M tokens) as a key differentiator. GPT-4o doesn't close that gap (128k tokens), but native multimodal superiority and aggressive pricing reduce Gemini's appeal for new projects. - **ElevenLabs, AssemblyAI, and audio specialists**: if GPT-4o natively handles transcription, synthesis, and emotional voice understanding in a single API call, the value proposition of specialized audio layers erodes significantly. - **ChatGPT Plus subscribers**: paying $20/month becomes less obvious if the free tier receives GPT-4o. OpenAI will need to accelerate differentiation (advanced plugins, memory, agent capabilities) to maintain conversion rates.

### 4. What to Watch

The real question is not model quality — benchmarks are clear — but the deployment speed of the real-time voice layer in third-party products. OpenAI announced a Voice API in limited access; until that opens broadly, the 232 ms latency remains a demonstration rather than production infrastructure.

Additionally, the decision to put GPT-4o on the free tier is as much a distribution move as a product decision: with Gemini natively integrated into Android and Google Search, OpenAI must maximize its model's surface area. Lowering the entry barrier is the direct response to Google's captive distribution. The cost of this strategy — partial cannibalization of Plus subscriptions — is evidently deemed acceptable.

Read source

Your take?

GPT OpenAI

Summary generated by Claude — human-verified

Spring Update

Other angles on this story