GPT-5.5

vgpt-5-5-2026-04-24

OpenAI

Modelreasoningflagshipmillion-token-contextagentic
91
Exceptional
About This Model

OpenAI's current flagship (codename 'Spud') and first fully retrained base model since GPT-4.5. ~1.05M context, 85.0% ARC-AGI-2, 93.6% GPQA Diamond, 58.6% SWE-Bench Pro. Designated migration target for most of the GPT-5.x line.

Last Evaluated: June 10, 2026
Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability
+

First fully retrained base since GPT-4.5. State-of-the-art across reasoning (85.0% ARC-AGI-2, 93.6% GPQA) and agentic coding (82.7% Terminal-Bench 2.0). ~40% more token-efficient than GPT-5.4.

task accuracy code

Industry-standard coding and terminal benchmarks measuring real-world software engineering tasks

Evidence
SWE-Bench Pro58.6% on SWE-Bench Pro (state-of-the-art at release)
Terminal-Bench 2.082.7% on command-line agentic tasks
highVerified: 2026-06-10
task accuracy reasoning

PhD-level science, frontier mathematics, and abstract reasoning benchmarks

Evidence
GPQA Diamond93.6% (PhD-level science questions)
ARC-AGI-285.0% (large jump over GPT-5.2's 52.9%, industry-leading abstract reasoning)
FrontierMath Tier 1-351.7% (up from GPT-5.2's 40.3%)
highVerified: 2026-06-10
task accuracy general

Expert-comparison knowledge work and computer-use benchmarks

Evidence
GDPval84.9% on economically valuable knowledge-work tasks
OSWorld-Verified78.7% on computer-use tasks (improving on GPT-5.4's 75%)
highVerified: 2026-06-10
output consistency

Internal consistency testing reported by provider across reasoning effort levels

Evidence
OpenAI AnnouncementFirst fully retrained base since GPT-4.5; ~40% fewer output tokens than GPT-5.4 for equivalent quality
highVerified: 2026-06-10
latency p50

Median latency for API requests with standard prompt sizes

Evidence
Community benchmarkingToken efficiency gains (~40% fewer output tokens) reduce end-to-end response times vs GPT-5.4
mediumVerified: 2026-06-10
latency p95

95th percentile response time across diverse workloads

Evidence
Community benchmarkingp95 latency varies significantly with reasoning effort setting
mediumVerified: 2026-06-10
context window

Official specification from provider

Evidence
OpenAI Documentation~1.05M token input context, 128K max output tokens
highVerified: 2026-06-10
uptime

Historical uptime data from official status page

Evidence
OpenAI Status99.9% uptime (last 90 days)
highVerified: 2026-06-10
🛡️Security
+

Mature multi-layer safety stack. Retrained base required full safety recalibration, which OpenAI reports as complete; long-tail agentic behaviors still being characterized by third parties.

prompt injection resistance

Testing against OWASP LLM01 prompt injection attacks

Evidence
OpenAI Safety ResearchStrengthened injection defenses for agentic and computer-use workflows
mediumVerified: 2026-06-10
jailbreak resistance

Adversarial prompt testing against jailbreak datasets

Evidence
GPT-5.5 AnnouncementRetrained base with updated safety training; improved refusal robustness over GPT-5.4
mediumVerified: 2026-06-10
data leakage prevention

Analysis of privacy policies and data handling practices

Evidence
OpenAI Privacy PolicyNo training on API data by default
mediumVerified: 2026-06-10
output safety

Safety testing across harmful content categories

Evidence
OpenAI SafetyMulti-layer safety stack carried forward and recalibrated for the retrained base
highVerified: 2026-06-10
api security

Review of API security features and best practices

Evidence
OpenAI Platform DocsAPI key + OAuth2 authentication, HTTPS only, rate limiting
highVerified: 2026-06-10
🔒Privacy & Compliance
+

Standard OpenAI enterprise posture: SOC 2, no API-data training by default, 30-day default retention with zero-data-retention options.

data residency

Review of enterprise documentation

Evidence
OpenAI EnterpriseData residency options for enterprise customers
highVerified: 2026-06-10
training data optout

Policy review of data usage terms

Evidence
OpenAI Data ControlsAPI data not used for training by default
highVerified: 2026-06-10
data retention

Terms of service and enterprise documentation review

Evidence
OpenAI Terms30-day default API log retention; zero-data-retention options for qualifying customers
highVerified: 2026-06-10
pii handling

Review of data protection capabilities

Evidence
OpenAI Safety ToolsCustomer responsible for PII redaction; moderation API available
mediumVerified: 2026-06-10
compliance certifications

Verification of compliance certifications

Evidence
OpenAI Trust CenterSOC 2 Type II, ISO 27001, GDPR compliant
highVerified: 2026-06-10
zero data retention

Enterprise feature review

Evidence
OpenAI EnterpriseZero-data-retention options available for enterprise and qualifying API customers
highVerified: 2026-06-10
👁️Trust & Transparency
+

Strong transparency with reasoning summaries and detailed release documentation. Training data disclosure remains at industry-standard (limited) level.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence
GPT-5.5 AnnouncementAdjustable reasoning effort with visible reasoning summaries
highVerified: 2026-06-10
hallucination rate

Factual accuracy testing on QA datasets

Evidence
OpenAI AnnouncementRetrained base continues factuality gains; builds on GPT-5.4's ~33% factual-error reduction
mediumVerified: 2026-06-10
bias fairness

Bias benchmarks and demographic testing

Evidence
OpenAI SafetyRegular bias testing and red-teaming program
mediumVerified: 2026-06-10
uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence
OpenAI DocumentationImproved calibrated uncertainty expression with reduced confident errors
mediumVerified: 2026-06-10
model card quality

Documentation completeness and clarity review

Evidence
GPT-5.5 Announcement and System CardDetailed release documentation covering capabilities, benchmarks, and migration guidance
highVerified: 2026-06-10
training data transparency

Review of public disclosures about training data

Evidence
OpenAI BlogGeneral description of training approach; specific sources not disclosed
mediumVerified: 2026-06-10
guardrails

Analysis of built-in safety mechanisms

Evidence
OpenAI Safety SystemsMulti-layer safety guardrails with agentic-workflow protections
highVerified: 2026-06-10
⚙️Operational Excellence
+

Industry-leading operational maturity. As the designated GPT-5.x migration target, GPT-5.5 offers the longest expected support horizon in the OpenAI lineup.

api design quality

Review of API design, consistency, and feature completeness

Evidence
OpenAI API ReferenceResponses API with streaming, function calling, vision, computer use, reasoning effort control
highVerified: 2026-06-10
sdk quality

SDK quality, documentation, and maintenance review

Evidence
OpenAI SDKsOfficial SDKs for Python, Node.js, Go, .NET, actively maintained
highVerified: 2026-06-10
versioning policy

Review of versioning policy and historical deprecation practices

Evidence
OpenAI DeprecationsPublished deprecation schedule; GPT-5.5 is the designated migration target for most of the GPT-5.x line
highVerified: 2026-06-10
monitoring observability

Review of available monitoring tools and metrics

Evidence
OpenAI DashboardDetailed usage dashboard with costs, tokens, rate limits
highVerified: 2026-06-10
support quality

Support and documentation assessment

Evidence
OpenAI Support24/7 support, comprehensive docs, active developer community
highVerified: 2026-06-10
ecosystem maturity

Ecosystem breadth and depth analysis

Evidence
OpenAI PlatformLargest AI ecosystem; available in ChatGPT (2026-04-23) and API (2026-04-24) with Batch and Flex tiers
highVerified: 2026-06-10
license terms

Review of licensing terms and restrictions

Evidence
OpenAI TermsStandard commercial terms with usage policies
highVerified: 2026-06-10
Strengths
  • +Industry-leading abstract reasoning: 85.0% ARC-AGI-2, 93.6% GPQA Diamond
  • +State-of-the-art agentic coding: 58.6% SWE-Bench Pro, 82.7% Terminal-Bench 2.0
  • +~1.05M token input context with 128K output
  • +First fully retrained base since GPT-4.5 with ~40% fewer output tokens than GPT-5.4
  • +84.9% GDPval on economically valuable knowledge work
  • +Designated long-term migration target for the GPT-5.x line
  • +Batch/Flex tiers at 50% discount
Limitations
  • !Premium pricing: $5/$30 per 1M tokens (2x GPT-5.4's base rate)
  • !Not HIPAA eligible
  • !30-day default API data retention (zero retention requires enterprise arrangement)
  • !GPT-5.5 Pro is very expensive ($30/$180 per 1M)
  • !Recently retrained base — long-tail behaviors less battle-tested than GPT-5.x predecessors
  • !Training data transparency limited (industry standard)
Metadata
pricing
input: $5.00 per 1M tokens
output: $30.00 per 1M tokens
notes: Batch and Flex processing at 50% discount. GPT-5.5 Pro priced at $30/$180 per 1M tokens.
last verified: 2026-06-10
context window: 1050000
max output: 128000
languages
0: English
1: Spanish
2: French
3: German
4: Italian
5: Portuguese
6: Japanese
7: Korean
8: Chinese
9: Russian
10: Arabic
11: Hindi
12: 50+ languages
modalities
0: text
1: vision
2: computer-use
api endpoint: https://api.openai.com/v1/responses
open source: false
architecture: Transformer-based; first fully retrained base since GPT-4.5 ('Spud')
parameters: Not disclosed
knowledge cutoff: December 2025

Use Case Ratings

code generation

58.6% SWE-Bench Pro and 82.7% Terminal-Bench 2.0. ~1.05M context fits very large codebases. Codex variants remain preferable for dedicated agentic coding pipelines.

customer support

Token efficiency (~40% fewer output tokens) lowers cost per conversation. $5/$30 pricing is premium for high-volume support.

content creation

Retrained base produces concise, higher-quality drafts. Strong long-form coherence over very long contexts.

data analysis

93.6% GPQA and 51.7% FrontierMath T1-3 support rigorous quantitative work. ~1.05M context enables whole-dataset reasoning.

research assistant

84.9% GDPval on expert knowledge work with ~1.05M context for literature-scale inputs.

legal compliance

Strong document reasoning; SOC 2 and zero-data-retention options available, but not HIPAA eligible and 30-day default retention.

healthcare

Excellent clinical reasoning (93.6% GPQA) but not HIPAA eligible; privacy controls less strict than Anthropic's.

financial analysis

Frontier math performance (51.7% FrontierMath T1-3) and GDPval results support complex financial modeling.

education

Top-tier STEM reasoning with adjustable effort for tutoring at different depths. Reduced hallucinations vs prior generations.

creative writing

Strong narrative quality; conciseness bias from token-efficiency training can need prompting for expansive prose.