Evaluation record · grok-4-3

Grok 4.3

v4.3

xAI

Modelreasoninglong-contextfunction-callingstructured-outputs

Strong

About This Model

xAI's workhorse model (released 2026-04-30): 1M context, reasoning, function calling, and structured outputs at $1.25/$2.50 per 1M tokens. Superseded as flagship by Grok 4.5 (2026-07-08, $2/$6, 500K context) but remains served and is the redirect target for retired Grok slugs. Strong frontier performance, but thinner enterprise compliance than Anthropic/OpenAI/Google, and the provider (now SpaceXAI post-SpaceX merger) faces active regulatory investigations over Grok content safety.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Frontier-class performance with a 1M context window and reasoning, function calling, and structured outputs. Launch date now consistently reported as 2026-04-30 (some aggregators previously listed 2026-05-06). Grok 4.5 (2026-07-08) now leads xAI/SpaceXAI's lineup, but 4.3 remains served and price-advantaged.

task accuracy code

Review of provider documentation and third-party benchmark aggregators

Evidence

xAI Model Documentation — Frontier coding performance positioned as flagship successor to Grok 4.1

llm-stats — Competitive with frontier peers on agentic coding evaluations

mediumVerified: 2026-07-09

task accuracy reasoning

Review of reasoning benchmark results from provider and aggregators

Evidence

xAI Model Documentation — Native reasoning mode with strong math and science performance

mediumVerified: 2026-07-09

task accuracy general

Crowdsourced arena comparisons and aggregator quality metrics

Evidence

LMArena Leaderboard — Top-tier placement among frontier models in crowdsourced comparisons

OpenRouter Model Listing — High usage and quality ratings since launch

mediumVerified: 2026-07-09

output consistency

Review of structured output features and community reports of repeated-prompt behavior

Evidence

xAI Model Documentation — Structured outputs and function calling support deterministic integration patterns

mediumVerified: 2026-07-09

latency p50

Median latency from third-party API benchmarking

Evidence

Community benchmarking — Typical time-to-full-response around 2s for standard prompts (non-reasoning mode)

mediumVerified: 2026-07-09

latency p95

95th percentile response time from third-party benchmarking; reasoning mode adds variance

Evidence

Community benchmarking — Tail latency higher when extended reasoning is engaged

lowVerified: 2026-07-09

context window

Official specification from provider documentation

Evidence

xAI Model Documentation — 1M token context window; higher per-token rate applies above 200K tokens

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

xAI Status Page — Generally stable availability since launch with occasional incidents

mediumVerified: 2026-07-09

🛡️Security

Reasonable baseline security, but xAI publishes substantially less safety and red-team documentation than Anthropic, OpenAI, or Google. Overall score reduced one point (83 to 82) on 2026-07-09 after the Grok content-safety crisis (Ofcom/European Commission investigations, Brazil ultimatum) exposed weak provider-level output-safety governance.

prompt injection resistance

Testing against OWASP LLM01 prompt injection patterns and review of published safety material

Evidence

xAI Documentation — Hardened system prompt handling; limited published red-team data

mediumVerified: 2026-07-09

jailbreak resistance

Review of adversarial prompt testing results and community jailbreak reports

Evidence

xAI News — Safety improvements cited at launch; less third-party adversarial testing than peers

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and data handling commitments

Evidence

xAI Privacy Policy — API data handling documented; fewer contractual controls than major enterprise providers

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories and review of published evaluations; score reduced 2026-07-09 to reflect the ongoing Grok content-safety crisis and regulatory findings against the provider's safety systems

Evidence

xAI Documentation — Content moderation in place; xAI publishes less safety evaluation detail than Anthropic/OpenAI/Google

TechPolicy.Press — Regulators Are Going After Grok and X — Ofcom and the European Commission opened formal investigations, and Brazil issued a 30-day ultimatum, over Grok's mass generation of sexualized imagery including apparent minors (CCDH: 3M+ sexualized images in under two weeks)

mediumVerified: 2026-07-09

api security

Review of API security features and authentication mechanisms

Evidence

xAI API Documentation — API key authentication, HTTPS only, rate limiting, team management in console

mediumVerified: 2026-07-09

🔒Privacy & Compliance

xAI's enterprise compliance posture remains thinner than Anthropic, OpenAI, or Google: SOC 2 in place but no HIPAA eligibility program and fewer regulated-industry attestations.

data residency

Review of provider documentation and enterprise materials

Evidence

xAI Documentation — US-based infrastructure; no published regional residency options

mediumVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

xAI Privacy Policy — API customer data not used for training by default per policy

mediumVerified: 2026-07-09

data retention

Review of terms of service and data retention policies

Evidence

xAI Privacy Policy — Limited retention for abuse monitoring; zero-retention requires enterprise agreement

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

xAI Documentation — Customer responsible for PII redaction; no built-in PII tooling

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications against major enterprise provider baselines

Evidence

xAI Trust Center — SOC 2 Type II; thinner certification portfolio (no HIPAA BAA program, limited GDPR tooling) vs Anthropic/OpenAI/Google

mediumVerified: 2026-07-09

zero data retention

Review of data handling practices and enterprise contract options

Evidence

xAI Trust Center — Zero-data-retention available only via negotiated enterprise terms

mediumVerified: 2026-07-09

👁️Trust & Transparency

Good developer-facing documentation and inspectable reasoning, but less published safety/bias evaluation than major competitors. Overall score reduced one point (82 to 81) on 2026-07-09 to reflect the guardrails downgrade following the Grok content-safety crisis and resulting Ofcom/EC/Brazil regulatory actions.

explainability

Evaluation of reasoning transparency and explanation capabilities

Evidence

xAI Model Documentation — Reasoning traces available via API for inspection

mediumVerified: 2026-07-09

hallucination rate

Review of provider claims and factual QA testing

Evidence

xAI News — Continued hallucination reductions claimed at launch, building on Grok 4.1 improvements

mediumVerified: 2026-07-09

bias fairness

Review of bias benchmark disclosures and independent reporting

Evidence

xAI Public Statements — Limited published bias evaluation; past Grok versions drew scrutiny over politically tuned behavior

lowVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model Behavior — Reasoning mode expresses uncertainty reasonably well

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

xAI Model Documentation — Detailed model page with capabilities, pricing, limits, and feature support

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

xAI Public Statements — General description including X platform data; detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms; score reduced 2026-07-09 given demonstrated large-scale guardrail failures in the provider's deployed Grok products

Evidence

xAI Documentation — Built-in moderation with developer controls; lighter-touch defaults than peers

The Conversation — Grok sexualized images AI reckoning — 2026 Grok controversy showed xAI's guardrails failed at scale on sexualized imagery, including of minors, prompting UK, EU, and Brazilian regulatory action

mediumVerified: 2026-07-09

⚙️Operational Excellence

Strong API and pricing, but the May 2026 retirement wave (with silent slug redirects to grok-4.3) highlights an aggressive deprecation culture enterprises should plan around. In mid-2026 xAI completed its merger into SpaceX and rebranded as SpaceXAI (new identity announced 2026-07-06); API endpoints and docs remain on x.ai domains as of 2026-07-09.

api design quality

Review of API design, consistency, and feature completeness

Evidence

xAI API Documentation — OpenAI-compatible API with reasoning, function calling, structured outputs, and prompt caching

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

xAI SDKs — Official SDKs plus broad compatibility with OpenAI client libraries

mediumVerified: 2026-07-09

versioning policy

Review of deprecation/migration practices; silent redirects of retired slugs reduce predictability for pinned workloads

Evidence

xAI Migration Guide (May 15 Retirement) — Retired Grok model slugs silently redirect to grok-4.3 rather than returning errors

highVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

xAI Console — Usage dashboard with spend and rate limit visibility

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

xAI Documentation — Improving documentation; support channels lighter than major cloud providers

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of third-party integrations and tools

Evidence

OpenRouter Model Listing — Available via OpenRouter and major LLM frameworks; growing third-party adoption

mediumVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

xAI Terms of Service — Clear commercial API terms; enterprise agreements available

highVerified: 2026-07-09

Strengths

+Aggressive pricing: $1.25/$2.50 per 1M tokens with $0.20 cached input
+1M token context window
+Full agentic feature set: reasoning, function calling, structured outputs
+Text and image input support
+Frontier-class performance across coding, reasoning, and general tasks
+OpenAI-compatible API simplifies migration

Limitations

!Thinner enterprise compliance posture than Anthropic, OpenAI, or Google (no HIPAA program)
!Retired Grok model slugs silently redirect to grok-4.3, risking unannounced behavior changes
!Higher per-token rate applies above 200K context
!Limited published safety, bias, and red-team evaluation detail
!Zero-data-retention only via negotiated enterprise terms
!Provider turbulence: xAI merged into SpaceX and rebranded SpaceXAI in mid-2026, while under active regulatory investigation (Ofcom, European Commission; Brazil ultimatum) over Grok content-safety failures in 2026
!No longer the flagship: Grok 4.5 (2026-07-08) sits above it in the lineup

Metadata

pricing

input: $1.25 per 1M tokens

output: $2.50 per 1M tokens

notes: Cached input $0.20 per 1M tokens. Higher per-token rate applies for requests above 200K context. Tool calls billed separately (web/X search and code execution $5 per 1K calls). Re-confirmed against xAI docs July 2026.

last verified: 2026-07-09

context window: 1000000

languages

0: English

1: Spanish

2: French

3: German

4: Italian

5: Portuguese

6: Japanese

7: Korean

8: Chinese

9: Arabic

10: Hindi

modalities

0: text

1: image (input)

api endpoint: https://api.x.ai/v1/chat/completions

open source: false

architecture: Transformer-based with native reasoning, function calling, and structured outputs

parameters: Not disclosed

release date: 2026-04-30

lifecycle status: Served and supported; superseded as flagship by Grok 4.5 (2026-07-08). Retired legacy Grok slugs redirect here.

Use Case Ratings

code generation

Frontier-class coding with function calling and structured outputs at very competitive pricing.

customer support

Fast, capable, and cheap for support workloads; compliance posture may limit regulated deployments.

content creation

Strong long-form generation with current-events awareness from the X ecosystem.

data analysis

Strong reasoning over large inputs; 1M context handles big datasets, with higher per-token rates above 200K.

research assistant

1M context plus reasoning makes it well suited to literature-scale synthesis at low cost.

legal compliance

Capable analytically, but thinner compliance certifications than Anthropic/OpenAI/Google providers.

healthcare

No HIPAA eligibility program; not recommended for PHI workloads.

financial analysis

Strong quantitative reasoning and real-time information; verify compliance requirements first.

education

Strong explanations at low cost; content controls are lighter-touch than peers.

creative writing

Distinctive voice and strong creative range; fewer content restrictions than competitors.

Similar Models

Grok 4.1

xAI

Grok 3 [Beta]

xAI

GPT-5.5

OpenAI

Claude Opus 4.8

Anthropic

Gemini 3.1 Pro

Google