Evaluation record · claude-sonnet-4-6

Claude Sonnet 4.6

v4.6

Anthropic

Modelcodingagenticproductionenterprise

Exceptional

About This Model

Anthropic's previous-generation Sonnet workhorse, superseded by Claude Sonnet 5 (2026-06-30) as the best speed/intelligence balance at the same $3/$15 price. Still fully supported (Active, tentative retirement not sooner than 2027-02-17), with a 1M token context window, 128K max output, adaptive thinking, the effort parameter including 'max', and strong computer-use accuracy.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Former value workhorse of the Claude lineup: near-Opus intelligence at Sonnet latency and price, with a 1M context window, adaptive thinking, and the full effort range including 'max'. Superseded by Claude Sonnet 5 (2026-06-30, same $3/$15 with intro $2/$10 through 2026-08-31) — prefer Sonnet 5 for new builds.

task accuracy code

Review of official model documentation and positioning for software engineering workloads

Evidence

Anthropic Models Documentation — Recommended model for agentic coding at the Sonnet tier; successor to Claude Sonnet 4.5

Anthropic: Introducing Claude Sonnet 5 — Sonnet 4.6 scores 62.3% SWE-bench Verified and 55.4% Terminal-bench vs Sonnet 5's 72.7% and 76.1%; Sonnet 5 is now the recommended Sonnet-tier model

Anthropic Migration Guide — Documented as the drop-in upgrade target for Sonnet 4.5, 4.0, 3.7, and 3.5 coding workloads

highVerified: 2026-07-09

task accuracy reasoning

Review of documented thinking capabilities and reasoning benchmark positioning

Evidence

Anthropic Models Documentation — Adaptive thinking supported; effort defaults to high, scaling reasoning depth with task complexity

highVerified: 2026-07-09

task accuracy general

Comprehensive knowledge and multimodal capability review against official documentation

Evidence

Anthropic Models Documentation — Positioned as Anthropic's best combination of speed and intelligence

highVerified: 2026-07-09

output consistency

Internal testing of output stability across effort levels and adaptive thinking

Evidence

Anthropic Models Documentation — Effort parameter (low/medium/high/max) gives explicit, repeatable quality/cost control; strong computer-use accuracy with adaptive thinking at high effort

highVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes

Evidence

Community benchmarking — Typical response time ~1.5s for standard prompts at low/medium effort; faster than Opus tier

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 latency ~3.5s; higher at high/max effort with adaptive thinking

mediumVerified: 2026-07-09

context window

Official specification from provider

Evidence

Anthropic Models Documentation — 1M token context window; 128K max output tokens (up to 300K on the Message Batches API via the extended-output beta)

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

Anthropic Status Page — Claude API uptime 99.57% (last 90 days)

highVerified: 2026-07-09

🛡️Security

Strong safety posture. Like Opus 4.6, last-assistant-turn prefills return a 400 — structured outputs (output_config.format) are the supported replacement.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence

Anthropic Safety Research — Strong resistance to prompt injection in agentic and computer-use settings

highVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets

Evidence

Anthropic Constitutional AI — Constitutional AI alignment with well-calibrated refusals

highVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and data handling practices

Evidence

Anthropic Privacy Statement — No training on user data without explicit consent

mediumVerified: 2026-07-09

output safety

Comprehensive safety testing across harmful content categories

Evidence

Anthropic Trust Center — Released with comprehensive safety evaluations under the Responsible Scaling Policy

highVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

Anthropic API Documentation — API key authentication, HTTPS only, rate limiting; assistant prefills removed (400), closing a response-steering vector

highVerified: 2026-07-09

🔒Privacy & Compliance

Same enterprise-grade privacy posture as the Opus tier: ephemeral data handling, strong certifications, HIPAA eligible.

data residency

Review of enterprise documentation and privacy policies

Evidence

Anthropic Enterprise Documentation — Data residency options for US and EU customers

highVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

Anthropic Privacy Policy — Opt-out available, no training on API data by default

highVerified: 2026-07-09

data retention

Review of terms of service and data retention policies

Evidence

Anthropic Terms of Service — API prompts and outputs not retained (except for trust & safety)

highVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Anthropic Privacy Documentation — Customer responsible for PII redaction

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

Anthropic Trust Center — SOC 2 Type II, GDPR compliant, HIPAA eligible

highVerified: 2026-07-09

zero data retention

Review of data handling practices

Evidence

Anthropic API Documentation — Ephemeral data processing, no storage of prompts/outputs

highVerified: 2026-07-09

👁️Trust & Transparency

Transparent compute controls (adaptive thinking + effort) and thorough migration documentation. Follows instructions closely, reducing prompt-engineering opacity.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence

Anthropic Models Documentation — Adaptive thinking and the effort parameter make reasoning depth explicit and controllable

highVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets and real-world usage

Evidence

Anthropic Testing — Improved factual calibration over Sonnet 4.5, especially with adaptive thinking enabled

mediumVerified: 2026-07-09

bias fairness

Evaluation on bias benchmarks and diverse demographic testing

Evidence

Anthropic Responsible Scaling Policy — Regular bias testing and mitigation

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model Behavior — Model expresses uncertainty appropriately; adaptive thinking scales effort with problem difficulty

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Anthropic Model Documentation — Comprehensive model documentation with capabilities, limitations, and migration guidance from Sonnet 4.5

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Anthropic Public Statements — General description provided, detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

Constitutional AI — Constitutional AI safety guardrails with well-calibrated refusals

highVerified: 2026-07-09

⚙️Operational Excellence

Production-ready with multi-cloud availability. Migration from Sonnet 4.5 requires setting effort explicitly (4.6 defaults to high) and removing assistant prefills.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Anthropic API Documentation — Adaptive thinking, effort parameter incl. max, structured outputs, streaming, tool use; prefills removed in favor of output_config.format

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

Anthropic SDKs — Official SDKs for Python, TypeScript, Java, Go, Ruby, C#, PHP — actively maintained

highVerified: 2026-07-09

versioning policy

Review of versioning policy and historical practices

Evidence

Anthropic API Versioning — Clear versioning with advance deprecation notice; documented migration path from Sonnet 4.5 and retired 3.x Sonnets

Anthropic Model Deprecations — Active; tentative retirement not sooner than February 17, 2027; recommended replacement for retired Sonnet 4

highVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

Anthropic Console — Usage dashboard with metrics

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

Anthropic Support — Email support, Discord community, comprehensive docs and migration guides

highVerified: 2026-07-09

ecosystem maturity

Analysis of third-party integrations and tools

Evidence

Cloud Providers — Available on AWS Bedrock, Google Vertex AI, Azure Foundry; default model in many agent frameworks

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

Anthropic Terms of Service — Standard commercial terms, enterprise agreements available

highVerified: 2026-07-09

Strengths

+Strong speed/intelligence balance at $3/$15 per 1M tokens (Sonnet 5 now leads the tier)
+1M token context window with 128K max output (300K on the Batch API extended-output beta)
+Adaptive thinking supported — no manual thinking budgets to tune
+Effort parameter including 'max' (not available on Sonnet 4.5 or Haiku)
+Strong computer-use accuracy for agentic automation
+HIPAA eligible with ephemeral data handling
+Multi-cloud availability (AWS, GCP, Azure)

Limitations

!Lower ceiling than Opus tier on the hardest reasoning and long-horizon agentic tasks
!Removed assistant prefills — code relying on prefills returns 400
!Effort defaults to high — Sonnet 4.5 migrations see higher latency/cost unless effort is set explicitly
!Superseded by Claude Sonnet 5 (2026-06-30): 62.3% vs 72.7% SWE-bench Verified at the same standard price
!No native audio capabilities

Metadata

pricing

input: $3.00 per 1M tokens

output: $15.00 per 1M tokens

notes: Same pricing as Sonnet 4.5. Batch API 50% discount. Prompt caching up to 90% savings. Confirmed unchanged at $3/$15 as of 2026-07-09; successor Sonnet 5 has the same standard price (intro $2/$10 through 2026-08-31).

last verified: 2026-07-09

context window: 1000000

max output: 128000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Arabic

10: Hindi

modalities

0: text

1: image (input)

2: document

3: computer-use

api endpoint: https://api.anthropic.com/v1/messages

open source: false

architecture: Transformer-based with Constitutional AI alignment, adaptive thinking, and effort parameter

parameters: Not disclosed

knowledge cutoff: August 2025 (reliable); training data through January 2026

Use Case Ratings

code generation

Excellent agentic coding at a fraction of Opus cost. Pair effort 'medium' with adaptive thinking for the best cost/quality balance.

customer support

The sweet spot for support: fast, empathetic, and cost-effective at scale. Use effort 'low' with thinking disabled for high-volume tiers.

content creation

Strong long-form and marketing content with fast turnaround. Opus tier still leads on the most nuanced pieces.

data analysis

Solid analytical capabilities with 1M context for large datasets at workhorse pricing.

research assistant

1M context handles large corpora; adaptive thinking deepens analysis when needed. Opus preferred for the hardest synthesis tasks.

legal compliance

Strong privacy posture, HIPAA eligible, 1M context for contract repositories. Escalate the highest-stakes reviews to Opus.

healthcare

HIPAA eligible with strong privacy controls. Well-suited to clinical documentation at production volume.

financial analysis

Good quantitative reasoning with predictable cost. Use effort 'high' for complex modeling; Opus for the hardest problems.

education

Fast, patient explanations at a price point that scales to large student populations.

creative writing

Capable creative writing with good narrative flow; Opus tier produces more distinctive prose.

Similar Models

Claude Sonnet 4.5

Anthropic

Claude Opus 4.6

Anthropic

Claude Opus 4.8

Anthropic

Claude Haiku 4.5

Anthropic