Back to feed
Hugging Face Blog·

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Signal
85
Hype
25
In three linesMeta releases Llama 3.1 in three sizes (405B, 70B, 8B) with multilingual support and extended context. Models support 128k tokens and cover 8 languages. Available open-source via Hugging Face.

## Llama 3.1: What 405B Parameters in Open-Source Actually Means

### 1. The Quantitative Leap

Meta is simultaneously releasing three models — 8B, 70B, and 405B parameters — under an open-source license, all with a 128,000-token context window. That's the structural break: before this release, no open-source model exceeded 72B parameters with a context window this long. The 405B is the first frontier-class model accessible without a closed API or restrictive commercial agreement.

Multilingual support covers 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. This isn't cosmetic — Meta's internal benchmarks show competitive performance on MGSM (multilingual math reasoning) against GPT-4o and Claude 3.5 Sonnet across these target languages.

### 2. Why 128k Tokens Changes Real-World Usage

The 128k context window (versus 8k in the original Llama 3) unlocks use cases that were structurally impossible on prior versions: full codebase ingestion, long-contract analysis, RAG without aggressive chunking, multi-step agents with extended history. For the 70B and 8B, this extended context is especially significant because these sizes are deployable on standard infrastructure — a 70B fits on 2× A100 80GB in fp16, an 8B on a single consumer GPU with 4-bit quantization.

The 405B requires heavier infrastructure (minimum 8× A100 or H100 in fp16), but Meta has released weights in BF16 and FP8-quantized versions, making deployment accessible to mid-range clusters. Hugging Face distributes the weights directly, with native integration into transformers and TGI (Text Generation Inference).

### 3. Benchmark Positioning and What the Numbers Hide

On MMLU, the 405B reaches scores comparable to GPT-4 (version 0613), and the 70B outperforms Llama 2 70B by ~15 absolute points. On HumanEval (code), the 405B scores around 89%, placing it at GPT-4o level according to Meta's published evaluations. Two caveats apply:

First, Meta is evaluating its own models — independent third-party benchmarks (notably LMSYS Chatbot Arena) will provide external validation. Second, benchmarks are saturating: MMLU at 88%+ no longer finely discriminates real capabilities on complex production tasks. Practitioners will need to run their own evaluations on domain-specific workloads.

The 8B is the most economically interesting model: it outperforms Llama 2 70B on most benchmarks while being ~9× cheaper to run inference on. For high-volume applications (classification, extraction, short-form generation), it's the rational entry point.

### 4. Losers and Structural Tensions

**Mistral AI** sees its 'best open-source' positioning directly challenged. Mixtral 8×22B, previously the open-source reference for complex tasks, now faces pressure on the performance-to-cost ratio. **Cohere** and **AI21 Labs**, which monetize mid-size models for enterprise, face free competition in their core segment.

For API providers (Together AI, Fireworks, Replicate), the immediate availability of weights creates margin pressure: customers can self-host rather than pay per call. This is structurally different from GPT-4 or Claude, which remain behind closed APIs.

The Llama 3.1 license permits commercial use and — a new addition — allows using model outputs to train other models, including competing ones. This is a change from Llama 2, which explicitly prohibited this. It means players like Mistral or startups can legally distill Llama 3.1 405B to create smaller, more efficient models.

The real question at 6 months: can the open-source community maintain the pace of fine-tuning, instruction-tuning, and alignment that Meta has industrialized? The base weights are available, but the gap between a base model and a production-ready assistant remains substantial in engineering terms.

Read source
Your take?
LlamaMeta AIOpen sourceBenchmarks

Summary generated by Claude — human-verified