Anthropic has shipped a new top-of-the-line model, and for the first time since 2023 it does not slot into the Haiku, Sonnet, Opus ladder. It sits above it. Claude Fable 5 (model ID claude-fable-5) is a new tier with a new name, a doubled price tag, and a noticeably stricter API surface. The launch material calls it the most intelligent Claude yet, which is what every launch material says, so in this post I want to do something more useful: lay out exactly what changed against the Opus 4.x family, show you the code that breaks and the code that replaces it, and give you a small harness so you can measure whether the upgrade earns its price on your workload rather than on someone else's benchmark slide.
The short version
- Fable 5 is a new tier above Opus, priced at $10 per million input tokens and $50 per million output tokens, exactly double Opus 4.8
- It keeps the 1M token context window and 128K max output of recent Opus models, at standard pricing with no long-context premium
- The API surface is the strictest yet: no temperature, no top_p, no manual thinking budgets, and even an explicit thinking disabled now returns a 400
- Knowledge runs up to a January 2026 cutoff, so it natively knows late-2025 events that older snapshots have to search for
- For plenty of workloads Opus 4.8 or Sonnet 4.6 remain the rational choice; benchmark before you migrate
Anthropic's Claude family now spans four names: Haiku, Sonnet, Opus, and Fable.
A fourth name, and why it sounds different
Anthropic's model names have always been a quiet size joke. A haiku is seventeen syllables, a sonnet is fourteen lines, an opus is a full-scale composed work. Small, medium, large, told through literary forms. Fable continues the literary theme but changes its meaning: a fable is not defined by length at all, it is defined by carrying a point. Whether that was the intent or just a nice accident, the name lands differently. Haiku, Sonnet, and Opus sound like formats; Fable sounds like a storyteller. Say it out loud and it is soft and rounded where Opus is punchy, which fits a model Anthropic describes as having a warmer, clearer writing voice than the deliberately clipped Opus 4.7 era.
The version number is the other tell. This is not Opus 4.9 or Opus 5. By jumping to a fresh name with a 5 attached, Anthropic is signalling a new top tier rather than an increment, the same way the jump from Claude 2 to the three-name Claude 3 family signalled a strategy change back in early 2024. Opus does not go away. Opus 4.8 remains in the lineup at its existing price, which makes this less of a replacement and more of a new ceiling, and that framing matters when we get to cost.
The headline numbers, side by side
Specs first, because most upgrade decisions die or survive on this table. Context window and output ceiling are unchanged from Opus 4.8. The two things that actually move are the price, which doubles, and the API surface, which tightens.
| Model | Input $/1M | Output $/1M | Context | Max output | Thinking |
|---|---|---|---|---|---|
| Claude Fable 5 (claude-fable-5) | $10.00 | $50.00 | 1M tokens | 128K | Adaptive only (explicit off is rejected) |
| Claude Opus 4.8 (claude-opus-4-8) | $5.00 | $25.00 | 1M tokens | 128K | Adaptive, or omit to run without |
| Claude Opus 4.7 (claude-opus-4-7) | $5.00 | $25.00 | 1M tokens | 128K | Adaptive, or omit to run without |
| Claude Sonnet 4.6 (claude-sonnet-4-6) | $3.00 | $15.00 | 1M tokens | 64K | Adaptive (manual budgets deprecated) |
| Claude Haiku 4.5 (claude-haiku-4-5) | $1.00 | $5.00 | 200K | 64K | Manual budget_tokens |
Fable 5 costs exactly double Opus 4.8 per token, the steepest gap between adjacent tiers in the current lineup
One detail worth pausing on: the 1M token context window comes at standard pricing. There is no long-context surcharge, which used to be the catch with very large windows. If your pipeline regularly stuffs hundreds of thousands of tokens of code or documents into a single request, the effective price gap versus Opus narrows, because you are paying for capability you actually use rather than a premium for the privilege.
Your first Fable 5 request
Getting a response out of Fable 5 looks almost identical to Opus 4.8, which is deliberate: the request surface is the same one Opus 4.7 introduced. The two habits worth building from day one are turning adaptive thinking on explicitly (it is not on when the field is omitted) and streaming anything with a large max_tokens, because a 128K output ceiling will blow through HTTP timeouts on a plain request.
# first_request.py
# pip install anthropic
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY from the environment
with client.messages.stream(
model="claude-fable-5",
max_tokens=64000,
thinking={"type": "adaptive"}, # not on by default; set it explicitly
output_config={"effort": "high"}, # low | medium | high | xhigh | max
system="You are a senior engineer doing a design review. "
"Be specific and name the trade-offs.",
messages=[{
"role": "user",
"content": "Review this plan: we want to move our monolith's reporting "
"queries to a read replica while writes stay on the primary. "
"What breaks first, and how do we detect it before users do?",
}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
message = stream.get_final_message()
print(f"\n\n[stop: {message.stop_reason}, "
f"output tokens: {message.usage.output_tokens}]")
The effort parameter is doing more work here than it looks. It controls how deeply the model thinks and how many tokens it is willing to spend across a task, from low for quick scoped answers up to max for problems where correctness beats cost. On this model generation, effort is the dial that replaced the old token budgets, and Anthropic's own guidance is to start at high and sweep up or down on your own evals instead of reflexively maxing it out.
The API got stricter, and your old code will 400
This is the part that bites during migration. The Opus 4.7 generation already removed sampling parameters and manual thinking budgets, and Fable 5 adds one more restriction of its own: you can no longer send an explicit thinking: {"type": "disabled"}. On Opus 4.8 that is accepted; on Fable 5 it returns a 400, and the only way to run without thinking is to omit the field entirely. In practice the philosophy is clear, Anthropic wants prompting to be the steering wheel and is removing every numeric knob that used to compete with it. Whether you find that liberating or annoying probably depends on how much tooling you built around temperature.
# migration_check.py
# Three request shapes that worked on older Claude models and now fail.
import anthropic
client = anthropic.Anthropic()
broken = [
{"temperature": 0.7}, # removed entirely
{"thinking": {"type": "enabled", "budget_tokens": 8000}}, # manual budgets gone
{"thinking": {"type": "disabled"}}, # fine on Opus 4.8, rejected on Fable 5
]
for params in broken:
try:
client.messages.create(
model="claude-fable-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Say hi"}],
**params,
)
except anthropic.BadRequestError as err:
print(f"rejected {list(params)[0]!r}: {err.message[:80]}")
# The shape that works: adaptive thinking on, every legacy knob deleted.
response = client.messages.create(
model="claude-fable-5",
max_tokens=1024,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Say hi"}],
)
print(response.content[0].text)
Warning
Audit before you swap the model string
Grep your codebase for temperature, top_p, top_k, budget_tokens, and assistant-turn prefills before pointing anything at claude-fable-5. Every one of them is a hard 400, not a deprecation warning. If you relied on temperature for output variety, the replacement is an explicit prompt instruction; if you relied on prefills to force JSON, the replacement is structured outputs via output_config.format.
Is it actually better? Measure it on your own work
Here is where I want to take the middle ground seriously. Anthropic positions Fable 5 as its most capable model, and based on the lineage that is plausible: each Opus 4.x release genuinely did improve long-horizon agentic work, bug-finding, and instruction following. But "most capable" and "worth 2x per token for your use case" are different claims, and at launch you will mostly be reading first-party numbers. The honest move is to run your own prompts through the tiers and look at three columns: quality you can judge, latency you can feel, and cost you can calculate. This little harness does the boring parts.
# compare_models.py
# Same prompt, three tiers, real latency and cost from your own workload.
import time
from anthropic import Anthropic
client = Anthropic()
PRICES = { # USD per million tokens: (input, output)
"claude-fable-5": (10.00, 50.00),
"claude-opus-4-8": (5.00, 25.00),
"claude-sonnet-4-6": (3.00, 15.00),
}
PROMPT = """You are reviewing a pull request that changes our retry logic.
Find the bug in this snippet and propose a minimal fix:
async def fetch_with_retry(url, retries=3):
for attempt in range(retries):
try:
return await http.get(url)
except TimeoutError:
await asyncio.sleep(2 ** attempt)
raise RuntimeError("unreachable")
"""
def run(model: str) -> None:
start = time.monotonic()
response = client.messages.create(
model=model,
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "high"},
messages=[{"role": "user", "content": PROMPT}],
)
elapsed = time.monotonic() - start
u = response.usage
in_price, out_price = PRICES[model]
cost = (u.input_tokens * in_price + u.output_tokens * out_price) / 1_000_000
print(f"{model:22} {elapsed:6.1f}s in={u.input_tokens:<6} "
f"out={u.output_tokens:<6} cost=${cost:.4f}")
print(response.content[-1].text[:200], "\n")
for model in PRICES:
run(model)
Run it with ten of your real prompts instead of my toy one and you will learn more than any launch post can tell you. A pattern worth watching for: on hard agentic tasks, a smarter model at higher effort sometimes finishes in fewer turns and fewer total tokens, which can partially or fully cancel the per-token premium. On simple extraction and classification, it almost never will, and Sonnet 4.6 at a fifth of the output price will quietly embarrass the flagship on cost per result.
What it knows, and the cutoff question
Fable 5 ships with a knowledge cutoff of January 2026, which means its training data includes events, libraries, and releases up to the start of this year. That is a practical difference, not a cosmetic one. Models with older snapshots have to burn a web search round-trip to know about a framework released last autumn; a model whose training already includes late 2025 answers those questions natively, faster, and without the failure modes of search. The flip side deserves equal weight: a cutoff is a hard wall, the model has no idea it is standing next to one unless asked carefully, and anything that happened after January is invisible to it. For current-events workloads you still want the web search tool wired in, whatever the cutoff says.
You do not have to memorise any of these specs either, because the Models API will tell you what a model supports at runtime. This is the right way to build capability checks into a pipeline that might run against several Claude versions.
# capabilities.py
# Ask the API what claude-fable-5 can do instead of hardcoding assumptions.
from anthropic import Anthropic
client = Anthropic()
m = client.models.retrieve("claude-fable-5")
print(m.display_name) # Claude Fable 5
print(m.max_input_tokens) # 1,000,000 token context window
print(m.max_tokens) # 128,000 max output tokens
caps = m.capabilities
print(caps["thinking"]["types"]["adaptive"]["supported"]) # True
print(caps["thinking"]["types"]["enabled"]["supported"]) # False: no budgets
print(caps["effort"]["max"]["supported"]) # True
print(caps["structured_outputs"]["supported"]) # True
# Which models in your org support adaptive thinking with a 1M window?
for model in client.models.list():
if (model.capabilities["thinking"]["types"]["adaptive"]["supported"]
and model.max_input_tokens >= 1_000_000):
print(model.id)
The pricing question nobody can dodge
At $10 in and $50 out, Fable 5 is the most expensive Claude ever sold through the API, and the gap to its own sibling is the story: double Opus 4.8 on both sides of the meter. To make that concrete, an agent run that consumes 200K input tokens and produces 30K output tokens costs about $3.50 on Fable 5 versus $1.75 on Opus 4.8 and $1.05 on Sonnet 4.6. Multiply by a few thousand runs a month and the model choice becomes a budget line, not a dropdown.
✓ Pros
- Highest-capability option for long-horizon agentic work and hard reasoning
- 1M context and 128K output at standard pricing, no long-context premium
- Knowledge into January 2026 reduces dependence on search for recent topics
- Same request surface as Opus 4.7/4.8, so migration from those is mostly a string swap
- Higher effort can mean fewer turns, which claws back some of the premium on complex tasks
✕ Cons
- Exactly 2x the per-token price of Opus 4.8, with first-party benchmarks the only evidence at launch
- Strictest API surface yet: no temperature, no top_p, no manual thinking budgets
- Explicit thinking disabled returns a 400, an extra trap Opus 4.8 does not have
- Overkill on cost for classification, extraction, and routine chat workloads
- Opus 4.8 remains available and very close in capability for many everyday tasks
My honest read: this launch is aimed at the workloads where model quality is the bottleneck, overnight coding agents, deep research, complex multi-step pipelines where a single wrong turn wastes an hour of compute. If that is you, the premium probably pays for itself and you should test seriously. If your traffic is summaries, lookups, and structured extraction, nothing about this release changes the advice that Sonnet 4.6 is the value king and Haiku 4.5 the throughput king. A flagship existing does not obligate you to fly it.
Pro tip
Prompt caching softens the price considerably, and Fable 5 has a friendlier cache floor than you might expect: its minimum cacheable prefix is 2,048 tokens, half of the 4,096 required on Opus 4.8. Cache reads bill at roughly a tenth of the input price, so a stable system prompt plus tool definitions in front of every request turns $10 input tokens into about $1 ones on repeat traffic.
# caching.py
# A stable prefix makes repeat requests ~90% cheaper on the cached portion.
response = client.messages.create(
model="claude-fable-5",
max_tokens=16000,
thinking={"type": "adaptive"},
system=[{
"type": "text",
"text": LARGE_STYLE_GUIDE, # stable content first, volatile last
"cache_control": {"type": "ephemeral"},
}],
messages=[{"role": "user", "content": question}],
)
u = response.usage
print(u.cache_creation_input_tokens) # first call: pays a ~1.25x write premium
print(u.cache_read_input_tokens) # repeat calls: ~0.1x of the input price
print(u.input_tokens) # only the uncached remainder bills full
! Common mistakes to avoid
-
✕Swapping the model string to claude-fable-5 without removing temperature, top_p, top_k, or budget_tokens from existing request code.
✓All of them return a hard 400 on Fable 5. Strip the sampling parameters, replace manual budgets with thinking: {"type": "adaptive"}, and steer behaviour through the prompt and the effort setting.
-
✕Sending thinking: {"type": "disabled"} because it worked on Opus 4.8.
✓Fable 5 rejects an explicit disabled with a 400. If you genuinely want no thinking, omit the thinking field entirely; otherwise leave adaptive on and let the model decide per task.
-
✕Requesting 100K+ output tokens on a plain non-streaming call and blaming the API for timeouts.
✓Anything beyond roughly 16K output should stream. Use messages.stream() with get_final_message() and the 128K ceiling becomes usable instead of theoretical.
-
✕Moving every workload to the flagship because it is the newest thing on the pricing page.
✓Route by task. Keep extraction and classification on Sonnet 4.6 or Haiku 4.5, keep solid everyday agent work on Opus 4.8, and reserve Fable 5 for the jobs where quality measurably moves your outcome.
? Frequently asked questions
Does Fable 5 replace Opus 4.8? +
No. Opus 4.8 stays in the lineup at $5/$25 per million tokens. Fable 5 is a new tier above it, the same way Opus has always sat above Sonnet. Nothing forces a migration.
Why can I not set the temperature anymore? +
Anthropic removed temperature, top_p, and top_k from this model generation entirely, starting with Opus 4.7. The intended steering mechanism is the prompt itself plus the effort parameter. If you used temperature for variety, ask for variety explicitly in the prompt.
Is the 1M context window really standard price? +
Yes. There is no long-context premium on Fable 5, so a 800K-token request bills at the same $10 per million input rate as a 2K-token one. Prompt caching can cut the repeat cost of large stable prefixes by roughly 90% on top of that.
How recent is its knowledge? +
The knowledge cutoff is January 2026, so late-2025 releases and events are in the training data. Anything after that needs the web search tool or content you supply in the prompt.
What is the cheapest way to evaluate it? +
Take ten real prompts from your production traffic, run them through Fable 5, Opus 4.8, and Sonnet 4.6 with the harness in this post, and compare quality, latency, and cost per result. An afternoon of testing beats a quarter of guessing.
The verdict, such as it is
Fable 5 is a real step: a new tier, a tighter and more opinionated API, fresher knowledge, and by every early indication the strongest reasoning Anthropic has shipped. It is also twice the price of a model that was the flagship until yesterday and remains excellent today. Both things are true at once, and pretending otherwise in either direction is how teams end up overpaying or underperforming. Treat the launch as an invitation to measure, not a command to migrate. Run the comparison harness on your own prompts, check whether higher effort actually reduces your turn counts, cache your prefixes, and let the spreadsheet rather than the announcement decide which model ID goes in your config.
Note
Specs in this post
Pricing, context windows, and API behaviour describe the Claude API at Fable 5 launch and may change. The Models API snippet above is the reliable way to check current capabilities at runtime.
Comments
0No comments yet. Be the first to share your thoughts.