OpenAI Blog·18 July 2024

GPT-4o mini: advancing cost-efficient intelligence

Signal

Hype

In three linesOpenAI releases GPT-4o mini, a smaller and cheaper model than GPT-4o. It delivers comparable performance on many tasks while reducing inference costs. The model supports text, vision, and audio.

## GPT-4o mini: what the launch actually means for practitioners

### 1. Pricing mechanics, hard numbers

GPT-4o mini is priced at **$0.15 per million input tokens and $0.60 per million output tokens** — roughly **10× cheaper than GPT-4o** ($2.50 / $10 per million tokens). It also undercuts GPT-3.5 Turbo on price while outperforming it on standard benchmarks. The strategic signal is unambiguous: OpenAI is no longer positioning GPT-3.5 Turbo as the cost-efficient default. It is effectively deprecated for any new production architecture.

### 2. Benchmark performance: what the numbers actually show

On **MMLU** (broad academic reasoning), GPT-4o mini scores **82%** versus 70% for GPT-3.5 Turbo. On coding benchmarks (**HumanEval**), it also surpasses GPT-3.5 Turbo. It remains below GPT-4o on complex multi-step reasoning and long-context coherence tasks — but for the majority of production use cases (classification, extraction, short-form generation, RAG over short chunks), the performance gap no longer justifies the cost premium of GPT-4o.

Context window is **128,000 input tokens** with a **16,000 token output limit** — matching GPT-4o. Native multimodal support (text + vision, audio forthcoming) puts it categorically above GPT-3.5 Turbo, which was text-only.

### 3. Who loses in this configuration

**Anthropic Claude Haiku** is the most directly exposed competitor. Haiku was the practitioner benchmark for "small, capable, cheap" until now. GPT-4o mini arrives with comparable or lower pricing at scale, native integration into the OpenAI ecosystem (Assistants API, announced fine-tuning, function calling, structured outputs), and the familiarity advantage for teams already running on GPT-3.5 Turbo.

**Mistral** (Mistral 7B, Mistral Small via API) and **Google Gemini 1.5 Flash** are also under pressure. Gemini Flash was Google's economic multimodal option — GPT-4o mini now occupies the same slot with the added weight of OpenAI's established enterprise trust and ecosystem depth.

**Third-party inference providers** (Together AI, Fireworks, Anyscale) monetizing access to equivalent open-source models (Llama 3 8B, Mistral 7B) face a shrinking value proposition: OpenAI's API convenience at now-competitive pricing reduces the incentive to manage inference infrastructure independently or route through intermediaries.

### 4. Practical implications for application architectures

For teams running GPT-3.5 Turbo in production: migration to GPT-4o mini is close to mechanical. Same pricing tier, better benchmark performance, multimodal included. The only real friction is validating outputs on domain-specific edge cases.

For **multi-model routing architectures** (increasingly common pattern: route simple queries to a lightweight model, complex ones to GPT-4o or Claude Opus): GPT-4o mini becomes the natural low-tier candidate, with the added benefit of staying within a single vendor — simplifying key management, quota tracking, and billing.

**Announced fine-tuning** on GPT-4o mini is a meaningful lever. Fine-tuning a model at $0.15/M input tokens enables specialized production models at very low inference cost — a use case GPT-3.5 Turbo fine-tuning already covered, but with a weaker base model.

Critical caveat: GPT-4o mini is **not open-weight**. Teams that require full control (on-premise deployment, air-gapped environments, data sovereignty) remain on Llama 3 8B or Mistral 7B. OpenAI is not competing in that segment and shows no intent to do so.

Read source

Your take?

GPT OpenAI Code generation

Summary generated by Claude — human-verified

GPT-4o mini: advancing cost-efficient intelligence

Other angles on this story