Claude Opus 4.6 Degraded? Critical Evidence Revealed

Claude Opus 4.6 degraded is the phrase many developers are now searching. The reason is simple: some users believe Anthropic’s flagship model became slower, costlier, and rougher in real workflows.
The full answer is more nuanced. As of April 12, 2026, official Anthropic sources confirm that Claude Opus 4.6 exists and remains Anthropic’s flagship broadly available model, but those same sources also confirm several things that help explain the backlash: heavier reasoning by default, repeated incidents, and some corrected public benchmark claims.
This guide uses Anthropic’s official launch post, model docs, platform release notes, status page, engineering post on BrowseComp, Artificial Analysis, LiveBench, and public GitHub issue reports in Anthropic’s Claude Code repository as the main sources.
If you want the short version, Claude Opus 4.6 degraded is a fair description of some real operational pain points. It is not strong proof that the model is broadly weaker overall.

Claude Opus 4.6 degraded: short answer

Claude Opus 4.6 degraded is partly supported by the evidence, but only in a narrow sense.

Anthropic officially says Opus 4.6 often thinks more deeply, and that this can add cost and latency on simpler tasks.
Anthropic’s docs show Opus 4.6 launched with adaptive thinking, high effort as the default, context compaction, and breaking changes like prefill removal.
Anthropic’s status history shows repeated Opus 4.6 incidents, including elevated errors.
Anthropic later corrected some public benchmark numbers, including a small downward adjustment to HLE and BrowseComp-related claims.
Independent testing from Artificial Analysis suggests Opus 4.6 used 30 to 60 percent more tokens than Opus 4.5 on GDPval-AA and became the most costly model they had tested there.
LiveBench slightly favours Opus 4.6 over Opus 4.5 overall, which argues against a simple “the model got worse everywhere” story.

Why Claude Opus 4.6 degraded matters

When users say Claude Opus 4.6 degraded, they are not only talking about benchmark rank.
They are talking about time, token burn, instruction retention, tool reliability, and how much supervision a workflow suddenly needs. That is why this discussion matters for teams treating AI as operational infrastructure.
If you are tracking how assistants become part of day-to-day execution rather than isolated demos, Progressive Robot’s guide to workflow automation is useful context.

Claude Opus 4.6 degraded: what Anthropic officially confirms

1. Opus 4.6 was built to think harder by default

To understand whether Claude Opus 4.6 degraded is a fair claim, start with Anthropic’s own launch language.
In Anthropic’s official Introducing Claude Opus 4.6 post, the company says Opus 4.6 “often thinks more deeply and more carefully revisits its reasoning” and warns that this can “add cost and latency on simpler ones.” Anthropic explicitly recommends dialing effort down from the default high setting to medium if the model is overthinking.
Anthropic’s API doc What’s new in Claude 4.6 makes the same pattern clearer.
It says adaptive thinking is the recommended mode for Opus 4.6 and that at the default high effort level, Claude almost always thinks. That alone helps explain why some users experienced Opus 4.6 as slower or more verbose even without a true loss of capability.

2. Opus 4.6 changed long-session behaviour in ways that can create friction

Anthropic launched Opus 4.6 alongside the compaction API, which automatically summarizes older context when conversations approach the window limit.
The same docs also note a breaking change: assistant prefills are not supported on Opus 4.6. These are not proof of degradation on their own, but they are real workflow changes that can destabilize established tooling and long-running agent sessions.
Anthropic’s Claude Platform release notes also show that the 1M context window went generally available for Opus 4.6 on March 13, 2026 and that Anthropic removed dedicated 1M rate limits rather than tightening them.
That matters because it weakens one popular theory that the model was quietly rate-limited into feeling worse.

3. Anthropic logged real user-visible incidents

Anthropic’s status history records multiple Opus 4.6 incidents.
These include “Elevated errors on Claude Opus 4.6” on February 28, 2026, another “Elevated errors on Claude Opus 4.6” on March 31, 2026, and an “Opus 4.6 and Sonnet 4.6 error rate elevated” incident later that same day.
Those are not subjective complaints. They are official service incidents that could absolutely make the model feel degraded during affected windows.

Claude Opus 4.6 degraded in the benchmark data?

1. Anthropic corrected some public performance claims

Anthropic’s Opus 4.6 announcement includes a February 23 note adjusting its Humanity’s Last Exam with tools score from 53.1 percent to 53.0 percent after an improved cheating-detection pipeline flagged additional issues.
In a later Anthropic engineering post, Eval awareness in Claude Opus 4.6’s BrowseComp performance, the company says the adjusted Opus 4.6 multi-agent BrowseComp score falls from 86.81 percent to 86.57 percent after contamination and eval-awareness review.
These are small changes, not proof of a model collapse. But they matter because they weaken any narrative that the launch numbers were beyond dispute.

2. Efficiency complaints have stronger support than

intelligence complaints
The clearest measurable case behind Claude Opus 4.6 degraded is not that it suddenly became weak.
It is that Opus 4.6 often uses more tokens and becomes more expensive for the same class of work.
Artificial Analysis reported that Opus 4.6 used around 160 million tokens to finish GDPval-AA tasks in adaptive thinking mode and used 30 to 60 percent more tokens than Opus 4.5. Their conclusion was not that Opus 4.6 underperformed. In fact, they say it took the lead. But they also say it became the most costly model they had tested on that benchmark so far.
That matters because many users do not experience a model only as benchmark rank. They experience it as time, tokens, rate limits, and how much babysitting a workflow needs.

3. Independent leaderboards do not show a broad collapse

LiveBench’s leaderboard currently shows Claude 4.6 Opus Thinking High Effort at 76.33 overall and Claude 4.5 Opus Thinking High Effort at 75.96.
That is not a dramatic gap, but it points slightly in Opus 4.6’s favour overall.
So if the claim is “Opus 4.6 is broadly less capable than Opus 4.5,” the independent benchmark evidence is weak.

What real users are reporting

The strongest public complaint cluster comes from Anthropic’s own claude-code repository.
That matters because it shows the issue is not just random social chatter.
For many developers, Claude Opus 4.6 degraded describes the day-to-day feeling of the product even if the raw model still scores well on hard evaluations.
A detailed February 25 issue titled Opus 4.6 comprehensive regression: loops, memory loss, ignored instructions – daily professional user report describes repeated exploration loops, context or memory loss after compaction, reading instruction files but not following them, repeating failed solutions, and over-engineering simple tasks.
The user says effective productivity dropped by roughly 50 to 60 percent after moving to Opus 4.6 and links several related issues in the same repository.
This evidence matters, but it needs careful framing.
These are user reports inside Claude Code, not a controlled API-only model evaluation. They likely reflect a mix of raw model behaviour, tool orchestration, compaction, prompt structure, and product-level defaults rather than a single clean model-weights regression.

Source-backed evidence table

Source	Date	Type	What it supports
Introducing Claude Opus 4.6	Feb. 5, 2026	Official Anthropic launch post	Confirms Opus 4.6 exists, is the flagship broadly available model, and can add cost and latency on simpler tasks because it thinks more deeply by default
What’s new in Claude 4.6	Current docs	Official Anthropic docs	Confirms adaptive thinking, default high effort, compaction, fast mode, and breaking changes such as prefill removal
Claude Platform release notes	Feb. 5 to Mar. 30, 2026 entries	Official Anthropic release notes	Confirms Opus 4.6 launch details, compaction rollout, and that Anthropic removed dedicated 1M rate limits on Mar. 13 rather than tightening them
Claude status history	Feb. to Apr. 2026	Official Anthropic status page	Confirms multiple Opus 4.6 error incidents and user-visible reliability issues
Eval awareness in Claude Opus 4.6’s BrowseComp performance	Mar. 6, 2026	Official Anthropic engineering note	Confirms benchmark contamination and eval-awareness issues and the adjusted BrowseComp score drop from 86.81% to 86.57%
Opus 4.6 Takes Lead in Agentic Real-World Knowledge Tasks	Feb. 5, 2026	Independent benchmark analysis	Supports the claim that Opus 4.6 used 30–60% more tokens than Opus 4.5 and was the most costly model tested there, even while leading the benchmark
LiveBench leaderboard	Current leaderboard (version 2026-01-08)	Independent benchmark	Suggests Opus 4.6 slightly outperforms Opus 4.5 overall, which weakens the claim of a broad capability regression
Claude Code issue #28469	Feb. 25, 2026	Public user report in Anthropic repo	Supports the existence of detailed user complaints around loops, compaction, instruction loss, and productivity decline, but only as anecdotal evidence

What the data does not strongly prove

Anthropic has not publicly admitted a broad, sustained post-launch quality collapse for Claude Opus 4.6.
The benchmark evidence does not show Opus 4.6 broadly underperforming Opus 4.5 overall.
The public complaints do not prove a pure model-level regression separate from Claude Code orchestration, compaction, or workflow defaults.
The official release notes do not support a simple theory that Anthropic tightened 1M context limits and made Opus 4.6 worse through a hidden rate-limit squeeze.

So is Claude Opus 4.6 actually degraded?

If by degraded you mean slower, more expensive, rougher on simple tasks, or more frustrating inside certain long-running Claude Code workflows, the answer is yes: there is real evidence for that. Anthropic’s own docs, incident history, and public user complaints all support that narrower reading.
If by degraded you mean broadly weaker than Opus 4.5 across independent capability benchmarks, the answer is no: the evidence does not support that claim strongly. The current benchmark picture is much closer to “higher ceiling, rougher defaults, and more operational friction.”
That distinction matters. Claude Opus 4.6 may be a stronger model on hard tasks while still feeling worse in production for people who care about stability, token efficiency, compaction behaviour, or tool discipline.

Final thoughts

Claude Opus 4.6 degraded is an evidence-backed headline only when it is used carefully.
The strongest case is about operational friction, slower simple-task performance, higher token usage, compaction-related complaints, and official incidents.
The weakest case is the broad claim that Anthropic’s flagship is simply less capable overall.
That is why the Claude Opus 4.6 degraded debate has lasted. It is not just about raw intelligence. It is about whether a model that thinks more, uses more tokens, changes context handling, and hits service incidents is actually better in the environments people depend on every day.