Evaluation record · claude-opus-4

Claude Opus 4

v20250514

Anthropic

Modelretiredcodingreasoninghipaa-eligible

Strong

About This Model

RETIRED: Anthropic retired Claude Opus 4 (claude-opus-4-20250514) on 2026-06-15 (deprecated 2026-04-14); API requests now fail. Recommended replacement: Claude Opus 4.8. Historically Anthropic's most powerful model of May 2025, with exceptional reasoning, coding (72.5-79.4% SWE-bench in high-compute), and agentic capabilities.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Historical evaluation: best-in-class performance at release (79.4% SWE-bench high-compute, 90% AIME high-compute). Model retired 2026-06-15 and is no longer served.

task accuracy code

Industry-standard coding benchmarks

Evidence

SWE-bench Verified — 72.5% standard, 79.4% in high-compute mode

highVerified: 2026-07-09

task accuracy reasoning

PhD-level reasoning benchmarks

Evidence

GPQA Diamond — 79.6% (83.3% high-compute)

AIME Math — 75.5% (90.0% high-compute)

highVerified: 2026-07-09

task accuracy general

Comprehensive knowledge testing

Evidence

MMLU — 88.8%

MMMU Visual Reasoning — 76.5%

highVerified: 2026-07-09

output consistency

Internal testing

Evidence

Anthropic Documentation — Highly consistent with extended thinking

highVerified: 2026-07-09

latency p50

Median latency

Evidence

Anthropic API Documentation — ~2.5s for standard prompts

mediumVerified: 2026-07-09

latency p95

95th percentile

Evidence

Community benchmarking — p95 ~5.5s

mediumVerified: 2026-07-09

context window

Official specification

Evidence

Anthropic API Documentation — 200K context, 32K max output

highVerified: 2026-07-09

uptime

Historical uptime data

Evidence

Anthropic Status Page — 99.95% uptime

highVerified: 2026-07-09

🛡️Security

Flagship security with ASL-3 standard and Constitutional AI. Strongest safety guardrails.

prompt injection resistance

OWASP LLM01 testing

Evidence

Anthropic Safety Research — Superior resistance via Constitutional AI

highVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing

Evidence

Anthropic Constitutional AI — Strongest jailbreak resistance

highVerified: 2026-07-09

data leakage prevention

Evidence

Anthropic Privacy Statement — No training on user data

highVerified: 2026-07-09

output safety

Comprehensive safety testing

Evidence

Anthropic Safety Evaluations — ASL-3 safety standard

highVerified: 2026-07-09

api security

Security features review

Evidence

Anthropic API Documentation — Enterprise-grade API security

highVerified: 2026-07-09

🔒Privacy & Compliance

Exceptional privacy. Ephemeral data handling, HIPAA eligible, strongest compliance for regulated industries.

data residency

Enterprise documentation review

Evidence

Anthropic Enterprise — Data residency options

highVerified: 2026-07-09

training data optout

Policy analysis

Evidence

Anthropic Privacy Policy — No API data training by default

highVerified: 2026-07-09

data retention

Terms review

Evidence

Anthropic Terms of Service — Ephemeral processing

highVerified: 2026-07-09

pii handling

Data protection review

Evidence

Anthropic Privacy Docs — Customer responsible for PII redaction

highVerified: 2026-07-09

compliance certifications

Certification verification

Evidence

Anthropic Trust Center — SOC 2 Type II, GDPR, HIPAA eligible

highVerified: 2026-07-09

zero data retention

Data handling review

Evidence

Anthropic API Docs — Ephemeral data processing

highVerified: 2026-07-09

👁️Trust & Transparency

Excellent transparency with extended thinking and comprehensive system card. Best-in-class guardrails.

explainability

Reasoning transparency evaluation

Evidence

Extended Thinking — Extended thinking exposes reasoning

highVerified: 2026-07-09

hallucination rate

Factual QA testing

Evidence

SimpleQA Benchmark — Strong factual accuracy

mediumVerified: 2026-07-09

bias fairness

Bias benchmark evaluation

Evidence

Anthropic Responsible Scaling — Regular bias testing

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment

Evidence

Model Behavior — Excellent uncertainty expression

highVerified: 2026-07-09

model card quality

Documentation review

Evidence

Claude 4 System Card — Comprehensive system card

highVerified: 2026-07-09

training data transparency

Public disclosure review

Evidence

Anthropic Public Statements — General description, cutoff March 2025

mediumVerified: 2026-07-09

guardrails

Safety mechanism analysis

Evidence

Constitutional AI — Strongest Constitutional AI guardrails

highVerified: 2026-07-09

⚙️Operational Excellence

Model retired 2026-06-15 on Anthropic-operated platforms; API requests fail. Migration target is Claude Opus 4.8. Versioning, ecosystem, and overall scores reduced to reflect retirement.

api design quality

API design review

Evidence

Anthropic API — Enterprise-grade RESTful API

highVerified: 2026-07-09

sdk quality

SDK quality review

Evidence

Anthropic SDKs — High-quality Python, TypeScript SDKs

highVerified: 2026-07-09

versioning policy

Versioning policy review

Evidence

Anthropic API Versioning — 6-month deprecation notice

Anthropic Model Deprecations — claude-opus-4-20250514 deprecated 2026-04-14 and retired 2026-06-15; requests fail; recommended replacement claude-opus-4-8

highVerified: 2026-07-09

monitoring observability

Monitoring tools review

Evidence

Anthropic Console — Comprehensive usage dashboard

highVerified: 2026-07-09

support quality

Support assessment

Evidence

Anthropic Support — Priority support for Opus users

highVerified: 2026-07-09

ecosystem maturity

Ecosystem analysis

Evidence

Integration Ecosystem — Mature ecosystem, Bedrock, Vertex AI

highVerified: 2026-07-09

license terms

License review

Evidence

Anthropic Commercial Terms — Clear enterprise terms

highVerified: 2026-07-09

Strengths

+Highest performance: 79.4% SWE-bench in high-compute (best overall)
+90% AIME math in high-compute mode (exceptional reasoning)
+Extended thinking for complex multi-step reasoning
+Strongest privacy: ephemeral data, HIPAA eligible, ASL-3 security
+200K context window for large documents
+Best-in-class Constitutional AI safety guardrails

Limitations

!RETIRED 2026-06-15 — no longer available on the Claude API; requests fail (migrate to Claude Opus 4.8)
!Premium pricing ($15/$75 per 1M tokens)
!Higher latency (~2.5s p50, 5.5s p95)
!Training cutoff March 2025
!Overkill for simple tasks (cost and latency)
!32K max output vs 64K for Sonnet 4

Metadata

pricing

input: $15.00 per 1M tokens

output: $75.00 per 1M tokens

notes: Historical flagship pricing. Model retired 2026-06-15 — no longer purchasable on Anthropic-operated platforms.

last verified: 2026-07-09

context window: 200000

max output tokens: 32000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Arabic

10: Hindi

modalities

0: text

1: image (input)

2: document

api endpoint: https://api.anthropic.com/v1/messages

open source: false

architecture: Transformer-based with Constitutional AI and extended thinking

parameters: Not disclosed

training cutoff: March 2025

safety level: ASL-3

Use Case Ratings

code generation

Historically best-in-class coding (79.4% SWE-bench high-compute). Retired — use Opus 4.8.

customer support

Excellent but potentially over-powered and expensive for standard customer support.

content creation

Exceptional creative writing with nuanced understanding and natural style.

data analysis

Superior analytical capabilities with extended thinking for complex analysis.

research assistant

Outstanding for research. Extended thinking enables deep analysis. 200K context for long documents.

legal compliance

Best for legal work in its era. HIPAA eligible, ephemeral data, ASL-3 security. Careful reasoning.

healthcare

Flagship for healthcare in its era. HIPAA eligible, strongest privacy, careful medical reasoning.

financial analysis

Exceptional for complex financial modeling and analysis. 90% AIME math in high-compute.

education

Excellent for education with patient, detailed explanations and strong knowledge base.

creative writing

Outstanding creative capabilities with nuanced character development and storytelling.

Similar Models

Claude Opus 4.8

Anthropic

Claude Opus 4.1

Anthropic

Claude Sonnet 4.6

Anthropic

Claude Haiku 4.5

Anthropic