Claude Opus 4.8: The Production Default Before You Reach for Fable 5
If you ship production software with LLMs, Claude Opus 4.8 is probably the model you should default to before reaching for anything more expensive. Released on May 28, 2026, claude-opus-4-8 is Anthropic's flagship hybrid-reasoning model for complex agentic coding, computer use, and professional knowledge work — and unlike the newer Claude Fable 5 tier, it ships at the same price as Opus 4.7. Here is what actually changed, what it costs, and how to use it without breaking your integration.
What Claude Opus 4.8 is
Claude Opus 4.8 builds on Opus 4.7 with measurable gains across coding, agentic workflows, and long-running tasks. Anthropic positions it as the model to start with for "complex agentic coding and enterprise work" — with Fable 5 reserved for workloads that need the absolute highest capability at double the price.
It is multimodal (text and image input), supports tool use, structured outputs, computer-use tools, prompt caching, and the Batch API. On the Claude API, Amazon Bedrock, Google Cloud, and Microsoft Foundry it ships with a 1-million-token context window by default and up to 128,000 output tokens per request — a meaningful shift from Opus 4.7, where the 1M window required a beta header.
The specs that matter
| Property | Claude Opus 4.8 |
|---|---|
| API model ID | claude-opus-4-8 |
| Context window | 1,000,000 tokens (default) |
| Max output | 128,000 tokens |
| Input price (regular) | $5 / million tokens |
| Output price (regular) | $25 / million tokens |
| Fast mode (research preview) | $10 input / $50 output (~2.5× faster) |
| Thinking | Adaptive, always on |
| Modality | Text + image |
| Knowledge cutoff | Jan 2026 |
Two pricing details easy to miss: long-context input pricing kicks in above 200k tokens, so dumping an entire monorepo into one call needs a cost check. And prompt caching now has a 1,024-token minimum on Opus 4.8 (down from 2,048 on 4.7), so shorter repeated prompts can cache when they could not before — with up to 90% savings on cached input.
The integration change: adaptive thinking only
If you are migrating from Opus 4.6 or earlier, the breaking change is thinking configuration. Opus 4.8 rejects manual budget_tokens thinking and requires adaptive thinking instead. You steer depth with the effort parameter (low, medium, high default, xhigh, max) in output_config, not a fixed token budget.
Anthropic recommends high as the default balance of quality and cost. For difficult coding and long-running agent workflows, use xhigh ("extra" in Claude Code). Temperature, top_p, and top_k sampling controls are also constrained — behavior is meant to be shaped through prompts, adaptive thinking, and effort.
If your stack still sends the old thinking payload, requests will fail with a 400. Pin the model ID, update the thinking block, and test before you cut over production traffic.
Where it actually wins
Anthropic's launch benchmarks paint a consistent picture — Opus 4.8 is strong where agents and code meet the real world:
| Benchmark | Opus 4.8 | Notes |
|---|---|---|
| SWE-bench Verified | 88.6% | Up from 87.6% (4.7) |
| SWE-bench Pro | 69.2% | Real-world agentic coding |
| Terminal-Bench 2.1 | 74.6% | GPT-5.5 still leads here (~78%) |
| OSWorld-Verified | 83.4% | Computer-use agents |
| Online-Mind2Web | 84% | Browser-agent tasks |
| GDPval-AA | 1890 Elo | Knowledge work; beats GPT-5.5's 1769 |
The practical read: Opus 4.8 is an excellent default for production coding agents, browser/computer-use automation, and document-heavy knowledge work. For terminal-heavy, long CLI sessions, competitors still have an edge on some evals — route by task, not by brand loyalty.
Anthropic also reports Opus 4.8 is roughly 4× less likely to let code flaws pass unremarked compared with earlier Opus versions — relevant if you use it for review, not just generation.
Opus 4.8 vs Fable 5 vs Sonnet 5
| Model | Input / Output | Best for |
|---|---|---|
| Claude Sonnet 5 | $3 / $15 | Speed + intelligence balance, high-volume agents |
| Claude Opus 4.8 | $5 / $25 | Production coding, enterprise agents, multimodal work |
| Claude Fable 5 | $10 / $50 | Long-horizon async work, maximum frontier capability |
Opus 4.8 is the sweet spot for teams that need frontier-quality coding daily without paying Fable-tier prices. Use Sonnet 5 for the inner loops and bulk calls; escalate to Opus 4.8 for hard problems; reach for Fable 5 only when the task genuinely needs the top tier (and design for classifier refusals and fallback).
That routing pattern is much simpler behind one OpenAI-compatible gateway: call claude-opus-4-8 as your default, fall back to claude-sonnet-5 for cost-sensitive paths, and escalate to claude-fable-5 when needed — one key, one SDK. Set that up here.
New platform features worth knowing
Several capabilities launched alongside Opus 4.8 that affect how you build:
- Fast mode (research preview): set
speed: "fast"with thefast-mode-2026-02-01beta header for ~2.5× output speed at premium pricing — useful for latency-sensitive demos, not for batch jobs. - Mid-conversation system messages: inject instructions mid-task without breaking prompt cache or routing through a user turn (GA on Opus 4.8).
- Dynamic Workflows (Claude Code, research preview): one agent plans, fans out into parallel subagents, and merges results in a single session — relevant if your team lives in Claude Code rather than raw API calls.
How to adopt without surprises
- Migrate thinking config first — adaptive + effort, not budget_tokens.
- Default effort to
high, raise toxhighonly where evals prove the gain. - Turn on prompt caching — the lower 1,024-token minimum makes it worthwhile on more workloads.
- Watch the 200k-token line for long-context pricing on huge prompts.
- Use Batch API where latency allows — 50% savings on eligible requests.
- Route by task: Opus 4.8 for quality-critical paths, Sonnet 5 for volume, Fable 5 for the hardest async work.
Bottom line
Claude Opus 4.8 is Anthropic's best generally available model for daily production work: strong agentic coding, multimodal input, 1M context by default, and unchanged $5/$25 pricing. The migration cost is real (adaptive thinking, effort tuning), but the capability jump over 4.7 is meaningful — especially on SWE-bench Pro, computer use, and knowledge-work evals.
Want to run claude-opus-4-8 alongside Sonnet 5 and Fable 5 with automatic fallback and unified billing behind one OpenAI-compatible key? Create a key and start building.