Home

Claude Opus 4.8: The Production Default Before You Reach for Fable 5

Claude Opus 4.8: The Production Default Before You Reach for Fable 5

If you ship production software with LLMs, Claude Opus 4.8 is probably the model you should default to before reaching for anything more expensive. Released on May 28, 2026, claude-opus-4-8 is Anthropic's flagship hybrid-reasoning model for complex agentic coding, computer use, and professional knowledge work — and unlike the newer Claude Fable 5 tier, it ships at the same price as Opus 4.7. Here is what actually changed, what it costs, and how to use it without breaking your integration.

What Claude Opus 4.8 is

Claude Opus 4.8 builds on Opus 4.7 with measurable gains across coding, agentic workflows, and long-running tasks. Anthropic positions it as the model to start with for "complex agentic coding and enterprise work" — with Fable 5 reserved for workloads that need the absolute highest capability at double the price.

It is multimodal (text and image input), supports tool use, structured outputs, computer-use tools, prompt caching, and the Batch API. On the Claude API, Amazon Bedrock, Google Cloud, and Microsoft Foundry it ships with a 1-million-token context window by default and up to 128,000 output tokens per request — a meaningful shift from Opus 4.7, where the 1M window required a beta header.

The specs that matter

Property Claude Opus 4.8
API model ID claude-opus-4-8
Context window 1,000,000 tokens (default)
Max output 128,000 tokens
Input price (regular) $5 / million tokens
Output price (regular) $25 / million tokens
Fast mode (research preview) $10 input / $50 output (~2.5× faster)
Thinking Adaptive, always on
Modality Text + image
Knowledge cutoff Jan 2026

Two pricing details easy to miss: long-context input pricing kicks in above 200k tokens, so dumping an entire monorepo into one call needs a cost check. And prompt caching now has a 1,024-token minimum on Opus 4.8 (down from 2,048 on 4.7), so shorter repeated prompts can cache when they could not before — with up to 90% savings on cached input.

The integration change: adaptive thinking only

If you are migrating from Opus 4.6 or earlier, the breaking change is thinking configuration. Opus 4.8 rejects manual budget_tokens thinking and requires adaptive thinking instead. You steer depth with the effort parameter (low, medium, high default, xhigh, max) in output_config, not a fixed token budget.

Anthropic recommends high as the default balance of quality and cost. For difficult coding and long-running agent workflows, use xhigh ("extra" in Claude Code). Temperature, top_p, and top_k sampling controls are also constrained — behavior is meant to be shaped through prompts, adaptive thinking, and effort.

If your stack still sends the old thinking payload, requests will fail with a 400. Pin the model ID, update the thinking block, and test before you cut over production traffic.

Where it actually wins

Anthropic's launch benchmarks paint a consistent picture — Opus 4.8 is strong where agents and code meet the real world:

Benchmark Opus 4.8 Notes
SWE-bench Verified 88.6% Up from 87.6% (4.7)
SWE-bench Pro 69.2% Real-world agentic coding
Terminal-Bench 2.1 74.6% GPT-5.5 still leads here (~78%)
OSWorld-Verified 83.4% Computer-use agents
Online-Mind2Web 84% Browser-agent tasks
GDPval-AA 1890 Elo Knowledge work; beats GPT-5.5's 1769

The practical read: Opus 4.8 is an excellent default for production coding agents, browser/computer-use automation, and document-heavy knowledge work. For terminal-heavy, long CLI sessions, competitors still have an edge on some evals — route by task, not by brand loyalty.

Anthropic also reports Opus 4.8 is roughly 4× less likely to let code flaws pass unremarked compared with earlier Opus versions — relevant if you use it for review, not just generation.

Opus 4.8 vs Fable 5 vs Sonnet 5

Model Input / Output Best for
Claude Sonnet 5 $3 / $15 Speed + intelligence balance, high-volume agents
Claude Opus 4.8 $5 / $25 Production coding, enterprise agents, multimodal work
Claude Fable 5 $10 / $50 Long-horizon async work, maximum frontier capability

Opus 4.8 is the sweet spot for teams that need frontier-quality coding daily without paying Fable-tier prices. Use Sonnet 5 for the inner loops and bulk calls; escalate to Opus 4.8 for hard problems; reach for Fable 5 only when the task genuinely needs the top tier (and design for classifier refusals and fallback).

That routing pattern is much simpler behind one OpenAI-compatible gateway: call claude-opus-4-8 as your default, fall back to claude-sonnet-5 for cost-sensitive paths, and escalate to claude-fable-5 when needed — one key, one SDK. Set that up here.

New platform features worth knowing

Several capabilities launched alongside Opus 4.8 that affect how you build:

  • Fast mode (research preview): set speed: "fast" with the fast-mode-2026-02-01 beta header for ~2.5× output speed at premium pricing — useful for latency-sensitive demos, not for batch jobs.
  • Mid-conversation system messages: inject instructions mid-task without breaking prompt cache or routing through a user turn (GA on Opus 4.8).
  • Dynamic Workflows (Claude Code, research preview): one agent plans, fans out into parallel subagents, and merges results in a single session — relevant if your team lives in Claude Code rather than raw API calls.

How to adopt without surprises

  • Migrate thinking config first — adaptive + effort, not budget_tokens.
  • Default effort to high, raise to xhigh only where evals prove the gain.
  • Turn on prompt caching — the lower 1,024-token minimum makes it worthwhile on more workloads.
  • Watch the 200k-token line for long-context pricing on huge prompts.
  • Use Batch API where latency allows — 50% savings on eligible requests.
  • Route by task: Opus 4.8 for quality-critical paths, Sonnet 5 for volume, Fable 5 for the hardest async work.

Bottom line

Claude Opus 4.8 is Anthropic's best generally available model for daily production work: strong agentic coding, multimodal input, 1M context by default, and unchanged $5/$25 pricing. The migration cost is real (adaptive thinking, effort tuning), but the capability jump over 4.7 is meaningful — especially on SWE-bench Pro, computer use, and knowledge-work evals.

Want to run claude-opus-4-8 alongside Sonnet 5 and Fable 5 with automatic fallback and unified billing behind one OpenAI-compatible key? Create a key and start building.