Back to feed
Simon Willison·

llm-anthropic 0.25.1

Signal
85
Hype
15
In three linesRelease of llm-anthropic 0.25.1: adds Claude Opus 4.8 model, -o fast 1 option for fast mode (enabled organizations), and default max_tokens now matches each model's maximum output instead of 8192.

## llm-anthropic 0.25.1: three changes, one that actually matters

### What changes in practice

Simon Willison releases llm-anthropic 0.25.1, the unofficial plugin connecting his `llm` CLI tool to the Anthropic API. Three changelog items, with very unequal significance.

**1. Claude Opus 4.8** (`claude-opus-4.8`) joins the available model list. Willison used this exact version to generate the examples in his release notes — a signal the model is functional in the plugin on day one. Opus 4.8 positions as an incremental update to the Opus 4 line, with no announced architectural break. Anthropic's published benchmarks place it above Opus 4 on extended reasoning tasks, but below Claude Sonnet 4.5 on cost/performance ratio for common workloads.

**2. `-o fast 1` option** for fast mode. This feature is restricted to organizations with the option enabled on their Anthropic account — not available by default. Fast mode reduces latency at the potential cost of lower quality on complex tasks. For automated CLI pipelines where speed matters more than depth, it's useful. For individual users without organizational access, this flag is invisible.

**3. Default max_tokens aligned to each model's maximum** — this is the most structurally significant change, and it deserves attention.

### The real change: end of the arbitrary 8,192-token ceiling

Before 0.25.1, every call via llm-anthropic was capped at 8,192 output tokens, regardless of the model's actual capability. This was not an Anthropic limitation — it was a hardcoded default in the plugin. Claude 3.5 Sonnet tops out at 8,192 tokens, but Claude 3 Opus caps at 4,096, and recent Claude 3.5/4 family models support up to 16,000 or even 32,000 tokens depending on configuration.

In practice: if you asked Claude Opus 4.8 to generate a long report, extended code analysis, or document translation, the plugin silently truncated output at 8,192 tokens even when the model could go further. GitHub issue #72 documents this behavior as a de facto bug.

With 0.25.1, `max_tokens` is now resolved dynamically per model. Two direct implications: - Long outputs are no longer silently truncated - Per-call cost may increase if your prompts naturally generate long responses the model was previously forced to cut short

For developers using `llm` in content generation scripts, automated documentation, or file analysis pipelines, this is a behavioral change worth explicitly testing before deploying to production.

### Who loses in this update

Users with workflows calibrated to the 8,192-token limit — for example, pipelines that chunked tasks assuming each call would never exceed that size — may see behavior change. A script that called the model 5 times to process a document might now do it in 2 calls, but each call will be more expensive in tokens.

Organizations without fast mode enabled don't benefit from the second new feature. Anthropic has not published public criteria for enabling this functionality, making it a de facto enterprise or partner-tier feature.

### llm ecosystem context

Willison's `llm` has become a reference tool for developers wanting to interact with multiple LLMs through a unified CLI interface. The `llm-anthropic` plugin is one of the most actively maintained in the ecosystem. This update closely follows Anthropic's Opus 4.8 release, confirming Willison maintains a fast synchronization cadence with Anthropic releases — typically under 48 hours between model announcement and plugin support.

For practitioners using `llm` in production, upgrading to 0.25.1 is recommended, with a prior audit of any scripts that depend on predictable output length.

Read source
Your take?
ClaudeAnthropicToolsOpen source

Summary generated by Claude — human-verified