Claude Opus 4.6

v20260205

Anthropic

Modelcodingreasoningenterprisehipaa-eligible
92
Exceptional
About This Model

Anthropic's frontier Opus released February 2026 with 80.8% SWE-bench Verified, breakthrough 68.8% ARC-AGI-2 abstract reasoning, adaptive thinking, and a 1M token context window. Now two generations behind Opus 4.8 but still served.

Last Evaluated: June 10, 2026
Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability
+

Generational leap in abstract reasoning (68.8% ARC-AGI-2, ~2x Opus 4.5). 80.8% SWE-bench with 1M context and 128K output. Introduced adaptive thinking and GA effort parameter including 'max'. Now two generations behind Opus 4.8 but still fully served.

task accuracy code

Industry-standard coding and agentic benchmarks measuring real-world software engineering and computer-use tasks

Evidence
SWE-bench Verified80.8% resolution rate (frontier-class software engineering)
Terminal-Bench 2.065.4% on command-line tasks (up from Opus 4.5's 59.3%)
OSWorld72.7% on computer-use tasks (up from Opus 4.5's 66.3%)
highVerified: 2026-06-10
task accuracy reasoning

Abstract reasoning and multi-step problem solving benchmarks

Evidence
ARC-AGI-268.8% (up from Opus 4.5's 37.6% — a generational leap in abstract reasoning)
Anthropic AnnouncementAdaptive thinking dynamically allocates reasoning depth per request
highVerified: 2026-06-10
task accuracy general

Comprehensive knowledge and multimodal testing across published benchmarks

Evidence
Anthropic Models DocumentationFrontier-class general knowledge and multimodal understanding at launch
highVerified: 2026-06-10
output consistency

Internal testing of output stability across effort levels and adaptive thinking

Evidence
Anthropic AnnouncementEffort parameter GA (low/medium/high/max) enables consistent quality control; adaptive thinking replaces manual budgets
highVerified: 2026-06-10
latency p50

Median latency for API requests with standard prompt sizes

Evidence
Community benchmarkingTypical response time ~2.5s for standard prompts at default effort
mediumVerified: 2026-06-10
latency p95

95th percentile response time across diverse workloads

Evidence
Community benchmarkingp95 latency ~5.5s; higher at max effort
mediumVerified: 2026-06-10
context window

Official specification from provider

Evidence
Anthropic Models Documentation1M token context window (beta at launch, since standard); 128K max output tokens
highVerified: 2026-06-10
uptime

Historical uptime data from official status page

Evidence
Anthropic Status Page99.9%+ uptime (last 90 days)
highVerified: 2026-06-10
🛡️Security
+

Strong safety posture. Removal of last-assistant-turn prefills (400 error) eliminates a common response-manipulation pattern; structured outputs replace it.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence
Anthropic Safety ResearchImproved resistance to prompt injection in agentic and computer-use settings
highVerified: 2026-06-10
jailbreak resistance

Testing against adversarial prompt datasets

Evidence
Anthropic Constitutional AIConstitutional AI alignment carried forward with enhanced refusal calibration
highVerified: 2026-06-10
data leakage prevention

Analysis of privacy policies and data handling practices

Evidence
Anthropic Privacy StatementNo training on user data without explicit consent
mediumVerified: 2026-06-10
output safety

Comprehensive safety testing across harmful content categories

Evidence
Anthropic Safety EvaluationsReleased with comprehensive safety evaluations under the Responsible Scaling Policy
highVerified: 2026-06-10
api security

Review of API security features and best practices

Evidence
Anthropic API DocumentationAPI key authentication, HTTPS only, rate limiting; removal of last-assistant-turn prefills closes a response-steering vector
highVerified: 2026-06-10
🔒Privacy & Compliance
+

Exceptional privacy posture with ephemeral data handling and strong compliance certifications. HIPAA eligible for healthcare.

data residency

Review of enterprise documentation and privacy policies

Evidence
Anthropic Enterprise DocumentationData residency options for US and EU customers
highVerified: 2026-06-10
training data optout

Analysis of privacy policy and data usage terms

Evidence
Anthropic Privacy PolicyOpt-out available, no training on API data by default
highVerified: 2026-06-10
data retention

Review of terms of service and data retention policies

Evidence
Anthropic Terms of ServiceAPI prompts and outputs not retained (except for trust & safety)
highVerified: 2026-06-10
pii handling

Review of data protection capabilities and customer responsibilities

Evidence
Anthropic Privacy DocumentationCustomer responsible for PII redaction
mediumVerified: 2026-06-10
compliance certifications

Verification of compliance certifications and audit reports

Evidence
Anthropic Trust CenterSOC 2 Type II, GDPR compliant, HIPAA eligible
highVerified: 2026-06-10
zero data retention

Review of data handling practices

Evidence
Anthropic API DocumentationEphemeral data processing, no storage of prompts/outputs
highVerified: 2026-06-10
👁️Trust & Transparency
+

Adaptive thinking improves transparency by making reasoning depth model-driven and observable. Strong instruction following reduces need for aggressive prompt engineering.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence
Adaptive Thinking FeatureAdaptive thinking surfaces reasoning depth decisions; effort parameter provides explicit compute transparency
highVerified: 2026-06-10
hallucination rate

Testing on factual QA datasets and real-world usage

Evidence
Anthropic TestingImproved factual calibration over Opus 4.5, especially at high and max effort
mediumVerified: 2026-06-10
bias fairness

Evaluation on bias benchmarks and diverse demographic testing

Evidence
Anthropic Responsible Scaling PolicyRegular bias testing and mitigation
mediumVerified: 2026-06-10
uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence
Model BehaviorModel expresses uncertainty appropriately; adaptive thinking scales effort with problem difficulty
mediumVerified: 2026-06-10
model card quality

Review of documentation completeness and clarity

Evidence
Anthropic Model DocumentationComprehensive model cards with capabilities, limitations, benchmarks
highVerified: 2026-06-10
training data transparency

Review of public disclosures about training data

Evidence
Anthropic Public StatementsGeneral description provided, detailed sources not disclosed
mediumVerified: 2026-06-10
guardrails

Analysis of built-in safety mechanisms

Evidence
Constitutional AIConstitutional AI safety guardrails with improved refusal calibration
highVerified: 2026-06-10
⚙️Operational Excellence
+

Mature operational profile with multi-cloud availability. Migration to 4.6 required removing assistant-turn prefills and moving to adaptive thinking — well-documented breaking changes.

api design quality

Review of API design, consistency, and feature completeness

Evidence
Anthropic API DocumentationAdaptive thinking, GA effort parameter (incl. max), structured outputs; prefills removed in favor of output_config.format
highVerified: 2026-06-10
sdk quality

Review of SDK quality, documentation, and maintenance

Evidence
Anthropic SDKsOfficial SDKs for Python, TypeScript, Java, Go, Ruby, C#, PHP — actively maintained
highVerified: 2026-06-10
versioning policy

Review of versioning policy and historical practices

Evidence
Anthropic API VersioningClear versioning with advance deprecation notice; Opus 4.6 remains served two generations behind Opus 4.8
highVerified: 2026-06-10
monitoring observability

Review of available monitoring tools and metrics

Evidence
Anthropic ConsoleUsage dashboard with metrics
mediumVerified: 2026-06-10
support quality

Assessment of documentation, community, and support responsiveness

Evidence
Anthropic SupportEmail support, Discord community, comprehensive docs and migration guides
highVerified: 2026-06-10
ecosystem maturity

Analysis of third-party integrations and tools

Evidence
Cloud ProvidersAvailable on AWS Bedrock, Google Vertex AI, Azure Foundry
highVerified: 2026-06-10
license terms

Review of licensing terms and restrictions

Evidence
Anthropic Terms of ServiceStandard commercial terms, enterprise agreements available
highVerified: 2026-06-10
Strengths
  • +Breakthrough abstract reasoning: 68.8% ARC-AGI-2 (up from Opus 4.5's 37.6%)
  • +Elite coding: 80.8% SWE-bench Verified, 65.4% Terminal-Bench 2.0
  • +Best-in-class computer use at launch: 72.7% OSWorld
  • +1M token context window (beta at launch) with 128K max output
  • +Adaptive thinking replaces manual thinking budgets — no tuning required
  • +Effort parameter GA including new 'max' level for compute control
  • +Same $5/$25 pricing as Opus 4.5 despite major capability gains
Limitations
  • !Two generations behind current Opus 4.8 (still served, but no longer frontier)
  • !Removed last-assistant-turn prefills — code relying on prefills returns 400
  • !Higher latency than Sonnet models (~2.5s p50)
  • !Premium pricing relative to Sonnet 4.6 ($5/$25 vs $3/$15)
  • !No native audio capabilities
  • !Training data transparency limited (industry standard)
Metadata
pricing
input: $5.00 per 1M tokens
output: $25.00 per 1M tokens
notes: Same pricing as Opus 4.5. Batch API 50% discount. Prompt caching up to 90% savings.
last verified: 2026-06-10
context window: 1000000
max output: 128000
languages
0: English
1: Spanish
2: French
3: German
4: Italian
5: Portuguese
6: Japanese
7: Korean
8: Chinese
9: Arabic
10: Hindi
modalities
0: text
1: image (input)
2: document
3: computer-use
api endpoint: https://api.anthropic.com/v1/messages
open source: false
architecture: Transformer-based with Constitutional AI alignment, adaptive thinking, and effort parameter
parameters: Not disclosed
knowledge cutoff: Not disclosed

Use Case Ratings

code generation

80.8% SWE-bench Verified and 65.4% Terminal-Bench 2.0. Excellent for complex software engineering, though Opus 4.7/4.8 now lead the family.

customer support

Strong empathy and natural conversation. Higher latency and cost than Sonnet for routine support volume.

content creation

Excellent long-form, nuanced content. Adaptive thinking allocates more reasoning to complex pieces automatically.

data analysis

Strong analytical capabilities with 1M context for large datasets. Effort 'max' useful for complex interpretation.

research assistant

1M context and 68.8% ARC-AGI-2 abstract reasoning make it exceptional for deep research and synthesis.

legal compliance

Strong privacy posture, HIPAA eligible. 1M context handles entire contract repositories in a single request.

healthcare

HIPAA eligible with strong privacy controls. Good for clinical documentation requiring high accuracy.

financial analysis

Excellent quantitative reasoning. Adaptive thinking scales analysis depth with problem complexity.

education

Excellent tutoring with patient explanations. Effort parameter lets platforms balance quality against cost.

creative writing

Strong creative capabilities with nuanced character development and narrative flow.