Evaluation record · gemini-3-1-pro

Gemini 3.1 Pro

vgemini-3.1-pro-preview

Google

Modelflagshippreviewreasoninglong-context

Exceptional

About This Model

Google's current flagship reasoning model with 77.1% ARC-AGI-2 (2.5x Gemini 3 Pro), 94.3% GPQA Diamond, 2887 Elo on LiveCodeBench Pro, and 1M token context. Supersedes the retired Gemini 3 Pro Preview (shut down 2026-03-09; the gemini-pro-latest alias now points here). Note: still served under the preview model ID gemini-3.1-pro-preview — official docs do not list it as GA, contrary to earlier reports. Gemini 3.5 Pro (announced I/O May 2026) has not shipped as of 2026-07-09.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Massive reasoning jump: 77.1% ARC-AGI-2 vs 31.1% for Gemini 3 Pro. Correction 2026-07-09: official docs list the model as gemini-3.1-pro-preview (preview, not GA), contrary to the earlier GA characterization; it is nonetheless Google's designated migration target for retired/deprecated Pro models. Gemini 3.5 Pro (announced I/O May 2026, June GA target slipped) has still not shipped as of 2026-07-09.

task accuracy code

Competitive programming and agentic tool-use benchmarks from official launch materials

Evidence

LiveCodeBench Pro — 2887 Elo on competitive coding (frontier-leading at launch)

MCP Atlas — 78.2% on multi-tool agentic orchestration

highVerified: 2026-07-09

task accuracy reasoning

Abstract reasoning and PhD-level science benchmarks reported at launch

Evidence

ARC-AGI-2 — 77.1% (vs 31.1% standard Gemini 3 Pro, ~2.5x generational gain)

GPQA Diamond — 94.3% on PhD-level science questions

highVerified: 2026-07-09

task accuracy general

Cross-benchmark comparison against predecessor Gemini 3 Pro Preview

Evidence

Google DeepMind Models Page — Positioned as flagship Pro tier, superseding Gemini 3 Pro Preview across general benchmarks

mediumVerified: 2026-07-09

output consistency

Consistency assessment based on serving track record and documented model behavior

Evidence

Google AI Documentation — Stable flagship serving since 2026-02-19, though the model ID remains gemini-3.1-pro-preview

mediumVerified: 2026-07-09

latency p50

Median latency from third-party aggregator measurements

Evidence

Community benchmarking — Typical response time under 2s for standard prompts; deep reasoning modes slower

lowVerified: 2026-07-09

context window

Official specification from provider documentation

Evidence

Gemini API Changelog — 1M token context window at launch

highVerified: 2026-07-09

uptime

Historical uptime data from official status page

Evidence

Google Cloud Status — 99.9% uptime (last 90 days, Vertex AI)

highVerified: 2026-07-09

🛡️Security

Inherits Google Cloud security posture. Configurable safety filters and Vertex AI IAM controls for enterprise deployment.

prompt injection resistance

OWASP LLM01 prompt injection testing and vendor safety documentation review

Evidence

Google AI Safety — Hardened prompt injection defenses carried forward from Gemini 3 line

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt dataset testing

Evidence

Google Safety Settings — Improved adversarial robustness reported at GA

mediumVerified: 2026-07-09

data leakage prevention

Evidence

Gemini API Terms — Paid-tier API data not used for training

mediumVerified: 2026-07-09

output safety

Safety filter testing across harmful content categories

Evidence

Google Safety Filters — Configurable multi-category safety filters

highVerified: 2026-07-09

api security

Review of API security features and infrastructure

Evidence

Google Cloud Security — Google Cloud security standards, IAM integration on Vertex AI

highVerified: 2026-07-09

🔒Privacy & Compliance

Strong enterprise posture via Vertex AI data governance, SOC/ISO certifications, and EU data residency options.

data residency

Cloud infrastructure and data residency documentation review

Evidence

Google Cloud Locations — EU data residency options available via Vertex AI regions

highVerified: 2026-07-09

training data optout

Evidence

Gemini API Terms — Paid API data not used for training; Vertex AI data governance applies

highVerified: 2026-07-09

data retention

Data retention policy review

Evidence

Google Cloud Service Terms — Enterprise zero-retention configurations available

mediumVerified: 2026-07-09

pii handling

Data protection capability review

Evidence

Google AI Documentation — Customer responsible for PII redaction; Cloud DLP integration available

mediumVerified: 2026-07-09

compliance certifications

Certification verification through Google Cloud compliance center

Evidence

Google Cloud Compliance — SOC 1/2/3, ISO 27001/27017/27018, GDPR, HIPAA (via Google Cloud)

highVerified: 2026-07-09

zero data retention

Enterprise feature review

Evidence

Vertex AI Data Governance — Zero-retention configuration available for enterprise Vertex AI customers

mediumVerified: 2026-07-09

👁️Trust & Transparency

Strong transparency via exposed thinking traces and comprehensive documentation. Training data details remain limited (industry standard).

explainability

Reasoning transparency evaluation

Evidence

Gemini API Documentation — Thinking traces and configurable reasoning depth exposed via API

highVerified: 2026-07-09

hallucination rate

Factual QA testing and vendor claims review

Evidence

Google Launch Materials — Improved factual grounding over Gemini 3 Pro Preview

mediumVerified: 2026-07-09

bias fairness

Bias benchmark evaluation and policy review

Evidence

Google AI Principles — Regular bias testing and mitigation per AI Principles

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression

Evidence

Model Behavior — Expresses uncertainty appropriately in extended reasoning mode

mediumVerified: 2026-07-09

model card quality

Documentation completeness review

Evidence

Gemini Documentation — Comprehensive launch documentation with benchmarks and limitations

highVerified: 2026-07-09

training data transparency

Public disclosure review

Evidence

Google AI Blog — General training description provided; detailed sources not disclosed

mediumVerified: 2026-07-09

guardrails

Safety mechanism analysis

Evidence

Safety Settings — Configurable multi-category safety guardrails

highVerified: 2026-07-09

⚙️Operational Excellence

Mature operational posture across all Google AI surfaces since 2026-02-19 launch. Pricing confirmed on the official pricing page (2026-07-09). Model ID remains gemini-3.1-pro-preview despite flagship positioning.

api design quality

API design and feature completeness review

Evidence

Gemini API — RESTful API with streaming, function calling, multimodal, thinking control

highVerified: 2026-07-09

sdk quality

SDK quality and maintenance assessment

Evidence

Google Gen AI SDKs — Unified Gen AI SDKs for Python, Node.js, Go, Java; actively maintained

highVerified: 2026-07-09

versioning policy

Versioning policy and changelog review

Evidence

Gemini API Changelog — Released 2026-02-19 with documented deprecation timeline for 3 Pro Preview

Gemini API Deprecations — gemini-3.1-pro-preview active with no shutdown date; designated replacement for gemini-3-pro-preview (shut down 2026-03-09) and gemini-2.5-pro (shutdown 2026-10-16); gemini-pro-latest alias points here since 2026-03-06

highVerified: 2026-07-09

monitoring observability

Observability tooling review

Evidence

Google Cloud Console — Comprehensive Cloud Console and Vertex AI monitoring

highVerified: 2026-07-09

support quality

Support channel assessment

Evidence

Google Cloud Support — Enterprise support tiers with SLAs

highVerified: 2026-07-09

ecosystem maturity

Ecosystem and integration analysis

Evidence

Google AI Ecosystem — Day-one availability across Gemini app, AI Studio, Vertex AI

highVerified: 2026-07-09

license terms

License terms review

Evidence

Google Cloud Terms — Standard commercial terms; enterprise agreements available

highVerified: 2026-07-09

pricing transparency

Verified against the official Gemini API pricing page

Evidence

Gemini API Pricing — Official: $2.00 input / $12.00 output per 1M tokens at <=200K context; $4.00/$18.00 above 200K; context caching $0.20-$0.40 plus $4.50/hour storage; paid tier only (no free tier)

highVerified: 2026-07-09

Strengths

+Exceptional abstract reasoning: 77.1% ARC-AGI-2 (~2.5x Gemini 3 Pro's 31.1%)
+94.3% GPQA Diamond, near-saturation PhD-level science
+Frontier coding: 2887 Elo LiveCodeBench Pro, 78.2% MCP Atlas
+1M token context window; Google's designated migration target for the retired 3 Pro Preview and deprecated 2.5 Pro
+Enterprise posture: Vertex AI data governance, SOC/ISO certs, EU residency
+Day-one availability across AI Studio, Vertex AI, and Gemini app

Limitations

!Still served under a preview model ID (gemini-3.1-pro-preview); not listed as GA in official docs
!Paid tier only — no free tier access (unique among current Gemini API models)
!Extended reasoning modes add significant latency
!Training data transparency limited (industry standard)
!Gemini 3.5 Pro (announced I/O May 2026, GA target slipped past June) may supersede it soon
!Long-context (>200K) pricing roughly doubles per-token cost

Metadata

pricing

input: $2.00 per 1M tokens (<=200K), $4.00 per 1M tokens (>200K)

output: $12.00 per 1M tokens (<=200K), $18.00 per 1M tokens (>200K)

notes: Confirmed on the official Gemini API pricing page 2026-07-09. Context caching $0.20-$0.40 per 1M plus $4.50/hour storage. Paid tier only (no free tier). Same pricing as the retired Gemini 3 Pro.

last verified: 2026-07-09

context window: 1000000

max output: 64000

languages

0: English

1: 100+ languages

modalities

0: text

1: vision

2: audio

3: video

api endpoint: https://generativelanguage.googleapis.com/v1beta/models

open source: false

architecture: Multimodal transformer with configurable extended reasoning

parameters: Not disclosed

knowledge cutoff: Late 2025 (not officially confirmed)

release date: 2026-02-19

Use Case Ratings

code generation

2887 Elo LiveCodeBench Pro and 78.2% MCP Atlas. Strong agentic coding; 1M context covers full codebases.

data analysis

1M context plus top-tier reasoning makes it excellent for massive dataset analysis.

research assistant

94.3% GPQA Diamond and 1M context. Best-in-class for deep multi-document research.

legal compliance

1M context for full contract corpora. EU data residency and Vertex AI governance support regulated workloads.

financial analysis

Frontier quantitative reasoning with long context for large filing sets.

education

Exceptional reasoning depth for tutoring; thinking traces aid pedagogical explanations.

content creation

Strong long-form generation; reasoning depth helps structured technical content.

healthcare

HIPAA via Google Cloud. Strong reasoning for clinical literature, but use Vertex AI governance controls.

Similar Models

Gemini 3 Pro

Google

Gemini 3.5 Flash

Google

Claude Opus 4.8

Anthropic

GPT-5.5

OpenAI