Evaluation record · gemini-3-flash

Gemini 3 Flash

vgemini-3-flash-preview

Google

Modelsupersededcost-effectivefast1m-tokens

Strong

About This Model

Google's efficiency model with Pro-level performance at low cost. 78% SWE-bench (beat Gemini 3 Pro), 1M context, 3x faster than 2.5 Pro. Thinking level parameter for compute control. Still served as gemini-3-flash-preview (no shutdown date announced), but superseded as Google's lead Flash tier by Gemini 3.5 Flash (2026-05-19), which Google lists as its designated replacement.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Exceptional value: 78% SWE-bench beats Pro at 1/4 the price. 3x faster than 2.5 Pro with 1M context. Beats Pro on tool use and MMMU.

task accuracy code

Industry-standard coding benchmarks

Evidence

SWE-bench Verified — 78% (actually beats Gemini 3 Pro's 76.2%)

highVerified: 2026-07-09

task accuracy reasoning

PhD-level reasoning benchmarks

Evidence

GPQA Diamond — 90.4% (near Pro-level 93.8%)

Toolathlon & MPC Atlas — Beats Gemini 3 Pro on tool use and multi-step planning

highVerified: 2026-07-09

task accuracy general

Multimodal understanding testing

Evidence

MMMU Pro — 81.2% (actually beats Gemini 3 Pro's 81%)

highVerified: 2026-07-09

output consistency

Consistency testing across thinking levels

Evidence

Google Documentation — Thinking level parameter enables consistent quality control

highVerified: 2026-07-09

latency p50

Median latency measurements

Evidence

Google Performance Data — 3x faster than Gemini 2.5 Pro

highVerified: 2026-07-09

latency p95

95th percentile measurements

Evidence

Community benchmarking — p95 latency optimized for speed

mediumVerified: 2026-07-09

context window

Official specification

Evidence

Google Documentation — 1M token context window

highVerified: 2026-07-09

uptime

Historical uptime data

Evidence

Google Cloud Status — 99.9% uptime

highVerified: 2026-07-09

🛡️Security

Strong security inherited from Gemini 3 family. Google Cloud infrastructure provides enterprise-grade protection.

prompt injection resistance

OWASP LLM security testing

Evidence

Google AI Safety — Inherited safety from Gemini 3 family

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing

Evidence

Google Safety Testing — Strong jailbreak resistance

mediumVerified: 2026-07-09

data leakage prevention

Evidence

Google Privacy — API data not used for training

mediumVerified: 2026-07-09

output safety

Safety testing

Evidence

Safety Filters — Configurable safety filters

highVerified: 2026-07-09

api security

API security review

Evidence

Google Cloud Security — Google Cloud security

highVerified: 2026-07-09

🔒Privacy & Compliance

Good privacy with Google Cloud. Free tier available. Enterprise options for enhanced compliance.

data residency

Cloud infrastructure review

Evidence

Google Cloud — Multiple region options

highVerified: 2026-07-09

training data optout

Terms review

Evidence

Gemini API Terms — API data not used for training

highVerified: 2026-07-09

data retention

Retention policy review

Evidence

Google Cloud Terms — Enterprise zero retention available

mediumVerified: 2026-07-09

pii handling

Data protection review

Evidence

Google AI Safety — Customer responsible for PII

mediumVerified: 2026-07-09

compliance certifications

Certification verification

Evidence

Google Cloud Compliance — SOC 2, ISO 27001, GDPR, HIPAA (via Google Cloud)

highVerified: 2026-07-09

zero data retention

Enterprise feature review

Evidence

Enterprise Options — Available for enterprise

mediumVerified: 2026-07-09

👁️Trust & Transparency

Strong transparency with thinking level parameter. Configurable reasoning depth for different use cases.

explainability

Reasoning transparency evaluation

Evidence

Thinking Level Parameter — Thinking level (minimal, low, medium, high) for reasoning control

highVerified: 2026-07-09

hallucination rate

Factual accuracy testing

Evidence

Google Testing — Improved accuracy over 2.5 Flash

mediumVerified: 2026-07-09

bias fairness

Bias evaluation

Evidence

Google AI Principles — Regular bias testing

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment

Evidence

Model Behavior — Appropriate uncertainty expression

mediumVerified: 2026-07-09

model card quality

Documentation review

Evidence

Gemini 3 Flash Documentation — Comprehensive documentation

highVerified: 2026-07-09

training data transparency

Public disclosure review

Evidence

Google AI Blog — General description

mediumVerified: 2026-07-09

guardrails

Safety mechanism review

Evidence

Safety Settings — Configurable safety filters

highVerified: 2026-07-09

⚙️Operational Excellence

Excellent operational maturity. Free tier available in API. Still in preview ~7 months after launch; Google's deprecations page names gemini-3.5-flash as its designated replacement, so plan migrations accordingly.

api design quality

API design review

Evidence

Gemini API — RESTful API with streaming, function calling, multimodal

highVerified: 2026-07-09

sdk quality

SDK quality assessment

Evidence

Google AI SDKs — SDKs for Python, Node.js, Go, Swift, Kotlin, Dart

highVerified: 2026-07-09

versioning policy

Versioning policy review

Evidence

Google Cloud Versioning — Clear versioning

Gemini API Deprecations — gemini-3-flash-preview listed with no announced shutdown date; designated replacement is gemini-3.5-flash

highVerified: 2026-07-09

monitoring observability

Observability review

Evidence

Google Cloud Console — Comprehensive monitoring

highVerified: 2026-07-09

support quality

Support assessment

Evidence

Google Cloud Support — Enterprise support with SLAs

highVerified: 2026-07-09

ecosystem maturity

Ecosystem analysis

Evidence

Google AI Ecosystem — Launched as default model in consumer Gemini app (December 2025); Google's Flash positioning has since moved to Gemini 3.5 Flash

highVerified: 2026-07-09

license terms

License review

Evidence

Google Cloud Terms — Standard commercial terms

highVerified: 2026-07-09

Strengths

+Pro-level performance at 1/4 the price ($0.50/$3 per 1M tokens)
+78% SWE-bench actually beats Gemini 3 Pro (76.2%)
+3x faster than Gemini 2.5 Pro with 1M token context
+Thinking level parameter (minimal, low, medium, high)
+Beats Pro on MMMU (81.2% vs 81%) and tool use
+Free tier available in Gemini API
+Default model in consumer Gemini app

Limitations

!Preview status (never promoted to GA; still gemini-3-flash-preview)
!SUPERSEDED: Gemini 3.5 Flash (2026-05-19) is now Google's lead Flash tier and the designated replacement, though no shutdown date is announced
!Slightly behind Pro on GPQA Diamond (90.4% vs 93.8%)
!Less deep reasoning than Pro's Deep Think mode
!Slightly higher than 2.5 Flash pricing ($0.50 vs $0.30)

Metadata

pricing

input: $0.50 per 1M tokens (text/image/video), $1.00 (audio)

output: $3.00 per 1M tokens

notes: Confirmed on the official Gemini API pricing page 2026-07-09. 1/4 the price of Gemini 3.1 Pro and 1/3 of Gemini 3.5 Flash. Free tier available.

last verified: 2026-07-09

context window: 1000000

max output: 64000

languages

0: English

1: 100+ languages

modalities

0: text

1: vision

2: audio

3: video

api endpoint: https://generativelanguage.googleapis.com/v1beta/models

open source: false

architecture: Multimodal transformer with thinking level parameter

parameters: Not disclosed

knowledge cutoff: January 2025

Use Case Ratings

code generation

78% SWE-bench beats Gemini 3 Pro. Excellent value for coding at 1/4 the price.

customer support

Low latency (3x faster than 2.5 Pro). Native multimodal for image/video support.

content creation

Good creative capabilities with cost efficiency. 1M context for long-form.

data analysis

1M context enables massive dataset analysis at low cost.

research assistant

1M context for document processing. Cost-effective for high-volume research.

legal compliance

1M context for contract analysis. Good value for document review.

healthcare

HIPAA via Google Cloud. Cost-effective for medical record processing.

financial analysis

Strong quantitative reasoning at low cost. 1M context for large documents.

education

90.4% GPQA Diamond. Cost-effective for educational platforms.

creative writing

Good creative capabilities. Best value for creative at scale.

Similar Models

Gemini 3.5 Flash

Google

Gemini 3.1 Pro

Google

Gemini 3 Pro

Google

Gemini 2.0 Flash

Google

Claude Sonnet 4.5

Anthropic