Evaluation record · gemma-4

Gemma 4

v4.0

Google

Modelopen-sourceapache-2-0open-weightsedge

Strong

About This Model

Google's open-weight family released April 2026 under Apache 2.0 (a shift from the custom Gemma license). Spans E2B/E4B edge models with 128K context and native audio up to a 31B dense model with 256K context. The 31B scores ~1452 on LMArena, No. 3 among open models.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Strongest open-weight showing from Google to date: 31B at ~1452 LMArena (No. 3 open). MoE 26B-A4B offers near-dense quality at 4B active params. Performance below proprietary frontier but excellent per-parameter efficiency.

task accuracy code

Vendor-reported coding benchmarks compared against open-weight peer class

Evidence

Google Gemma 4 Announcement — Substantial coding gains over Gemma 3 across family; strong for open-weight class, below frontier proprietary models

mediumVerified: 2026-07-09

task accuracy reasoning

Reasoning benchmark review from launch materials and open-model leaderboards

Evidence

Hugging Face Gemma 4 Blog — Reasoning improvements over Gemma 3; 26B-A4B MoE delivers near-dense quality with 4B active parameters

mediumVerified: 2026-07-09

task accuracy general

Crowdsourced human preference rankings on LMArena

Evidence

LMArena Leaderboard — Gemma 4 31B ~1452 Elo (No. 3 among open models); 26B-A4B MoE ~1441 with only 4B active params

highVerified: 2026-07-09

output consistency

Community reports across deployment stacks; high variance by quantization level

Evidence

Community testing — Consistency depends on quantization and inference stack chosen by deployer

lowVerified: 2026-07-09

latency p50

Self-hosted model; latency is a function of deployer infrastructure

Evidence

Hugging Face Gemma 4 Blog — E2B (2.3B effective) and E4B (4.5B) run on-device; latency depends entirely on hardware and serving stack

lowVerified: 2026-07-09

context window

Official specification from launch announcement

Evidence

Google Gemma 4 Announcement — 256K context on 12B/26B-A4B/31B; 128K on E2B/E4B edge variants

highVerified: 2026-07-09

uptime

No single provider SLA; assessed as deployment-dependent

Evidence

Hugging Face Model Hub — Open weights; availability depends on chosen hosting (self-hosted, Vertex AI, or third-party providers)

highVerified: 2026-07-09

🛡️Security

Security profile is deployment-dependent: excellent data isolation when self-hosted, but guardrails are removable and there is no managed abuse filtering unless the deployer adds it (e.g., ShieldGemma, Vertex AI).

prompt injection resistance

OWASP LLM01 assessment relative to model class; deployer must add input filtering

Evidence

Gemma Responsible AI Toolkit — Safety tuning applied, but smaller open models are generally more susceptible than frontier hosted models; no managed input filtering by default

lowVerified: 2026-07-09

jailbreak resistance

Adversarial testing of instruction-tuned checkpoints; open weights inherently allow guardrail removal

Evidence

Gemma Safety Documentation — Instruction-tuned variants include safety alignment, but open weights permit fine-tuning that removes guardrails

lowVerified: 2026-07-09

data leakage prevention

Architectural assessment: no third-party data flow when self-hosted

Evidence

Self-hosted deployment model — Self-hosting means no prompts or outputs leave deployer infrastructure

highVerified: 2026-07-09

output safety

Safety testing of released checkpoints and available companion classifiers

Evidence

Gemma Responsible AI Toolkit — Safety-tuned checkpoints plus companion safety classifiers (ShieldGemma line) available

mediumVerified: 2026-07-09

api security

Assessment of typical self-hosted serving stacks vs managed alternatives

Evidence

Deployment options — No first-party managed API security; depends on serving stack (Vertex AI managed endpoints inherit GCP controls)

lowVerified: 2026-07-09

🔒Privacy & Compliance

Best-in-class data sovereignty: nothing leaves deployer infrastructure. The trade-off is that compliance certifications are not inherited from the model and must be built or bought by the deployer.

data residency

Architectural assessment of self-hosted deployment

Evidence

Open weights distribution — Weights run anywhere: on-premises, air-gapped, or any cloud region

highVerified: 2026-07-09

training data optout

Architectural assessment: inference data never leaves deployer

Evidence

Self-hosted deployment model — No user data ever transmitted to Google during inference; opt-out concern does not apply

highVerified: 2026-07-09

data retention

Architectural assessment

Evidence

Self-hosted deployment model — Retention policy is entirely the deployer's choice

highVerified: 2026-07-09

pii handling

Data flow analysis for self-hosted inference

Evidence

Self-hosted deployment model — PII never leaves deployer infrastructure, but redaction tooling must be self-implemented

mediumVerified: 2026-07-09

compliance certifications

Review of certification inheritance paths for open-weight deployments

Evidence

Deployment-dependent compliance — Model itself carries no certifications; compliance (SOC 2, HIPAA, GDPR) must be achieved by the deployer's stack or inherited from a managed host like Vertex AI

mediumVerified: 2026-07-09

zero data retention

Architectural assessment

Evidence

Self-hosted deployment model — Inherently zero retention when self-hosted

highVerified: 2026-07-09

👁️Trust & Transparency

High transparency by open-model standards: published technical report, architecture disclosure (including MoE active-parameter counts), and fully auditable weights. Apache 2.0 relicensing further reduces legal opacity.

explainability

Assessment of inspection capabilities afforded by open weights

Evidence

Open weights access — Full weight access enables interpretability research, logit inspection, and custom probing

mediumVerified: 2026-07-09

hallucination rate

Factual QA testing relative to model size class

Evidence

Community evaluation — Smaller open models hallucinate more than frontier hosted models, especially E2B/E4B variants

lowVerified: 2026-07-09

bias fairness

Model card review and independent audit availability

Evidence

Gemma Model Card — Bias evaluations published in model card; open weights allow independent auditing

mediumVerified: 2026-07-09

uncertainty quantification

Calibration assessment; logprob access partially offsets weaker verbal uncertainty

Evidence

Open weights access — Raw logprobs fully accessible for custom calibration, but model self-expression of uncertainty is weaker than frontier tier

lowVerified: 2026-07-09

model card quality

Documentation completeness review

Evidence

Gemma 4 Technical Report and Model Cards — Detailed technical report, per-size model cards, architecture details (MoE config, effective params), and evaluation suite published

highVerified: 2026-07-09

training data transparency

Public disclosure review against open-model norms

Evidence

Gemma 4 Technical Report — Training data composition described at category level (better than most proprietary models); exact corpus not released

mediumVerified: 2026-07-09

guardrails

Analysis of built-in and companion safety mechanisms

Evidence

Responsible AI Toolkit — Safety-tuned checkpoints plus optional classifier models; guardrails are removable by design in open weights

mediumVerified: 2026-07-09

⚙️Operational Excellence

Apache 2.0 relicensing is the headline trust improvement: prior Gemma generations carried custom-license use restrictions. Operational burden (monitoring, scaling, support) falls on the deployer, as with any open-weight model.

api design quality

Review of available serving interfaces and their consistency

Evidence

Deployment options — No single first-party API; served via Vertex AI, Hugging Face TGI, vLLM, Ollama, llama.cpp with varying interfaces

mediumVerified: 2026-07-09

sdk quality

Ecosystem tooling support assessment

Evidence

Hugging Face Transformers — Day-one support in Transformers, vLLM, llama.cpp, Ollama, MLX, and Keras

highVerified: 2026-07-09

versioning policy

Release cadence and immutability review

Evidence

Gemma release history — Clear generational releases (supersedes Gemma 3); pinned weights never change once published

mediumVerified: 2026-07-09

monitoring observability

Assessment of out-of-box observability versus managed APIs

Evidence

Self-hosted deployment model — No built-in monitoring; deployer must assemble observability from serving-stack tooling

mediumVerified: 2026-07-09

support quality

Support channel assessment for open-weight distribution

Evidence

Community channels — Community support (Hugging Face, GitHub, Discord); no SLA unless deployed via managed platforms

mediumVerified: 2026-07-09

ecosystem maturity

Third-party integration and adoption analysis

Evidence

Hugging Face Gemma 4 Launch — Day-one integration across the open-model ecosystem; Gemma family has hundreds of millions of cumulative downloads

highVerified: 2026-07-09

license terms

License analysis; Apache 2.0 is OSI-approved with no usage restrictions

Evidence

Google Gemma 4 Announcement — Apache 2.0 — a shift from the custom Gemma license, removing use-restriction ambiguity for commercial deployment

highVerified: 2026-07-09

Strengths

+Apache 2.0 license — removes custom-license restrictions of prior Gemma generations
+Top-3 open model: 31B at ~1452 LMArena Elo
+Efficient MoE: 26B-A4B reaches ~1441 Elo with only 4B active parameters
+Full data sovereignty: self-hosted inference, zero data leaves deployer
+Edge-capable E2B/E4B variants with 128K context and native audio
+256K context on 12B/26B/31B variants — large for open weights
+Multimodal input: text, image, and video

Limitations

!No inherited compliance certifications; deployer builds or buys SOC 2/HIPAA posture
!Safety guardrails removable via fine-tuning (inherent to open weights)
!No first-party SLA or managed support outside Vertex AI hosting
!Hallucination and reasoning depth below frontier hosted models, especially E2B/E4B
!Operational burden (serving, scaling, monitoring) falls on deployer
!Performance varies significantly with quantization choices

Metadata

pricing

input: Free (open weights; compute costs only)

output: Free (open weights; compute costs only)

notes: Apache 2.0. Self-hosting compute is the only cost; managed hosting available via Vertex AI and third-party providers.

last verified: 2026-07-09

context window: 262144

max output: 32768

languages

0: English

1: 140+ languages

modalities

0: text

1: image (input)

2: video (input)

3: audio (input, E2B/E4B)

api endpoint: https://huggingface.co/google

open source: true

architecture: Family: E2B (2.3B effective) and E4B (4.5B) edge models; 12B dense; 26B-A4B MoE (4B active); 31B dense

parameters: 2.3B effective (E2B) to 31B dense; 26B MoE with 4B active

knowledge cutoff: Late 2025 (not officially confirmed)

release date: 2026-04-02

Use Case Ratings

code generation

Capable for an open model, especially 31B with 256K context, but well below frontier proprietary coding models.

customer support

26B-A4B MoE (4B active) gives strong quality at low serving cost for high-volume support; E4B enables on-device assistants.

content creation

Solid drafting quality at 31B (~1452 LMArena); fully private content pipelines possible.

education

E2B/E4B with native audio enable offline, on-device tutoring in low-connectivity settings.

healthcare

Self-hosting suits strict data sovereignty (PHI never leaves infrastructure), but deployer carries the full compliance and accuracy-validation burden.

research assistant

256K context on 31B handles long documents; auditable weights suit reproducible research. Reasoning depth below frontier.

Gemma 4

Trust Vector Analysis

Dimension Breakdown

Use Case Ratings

code generation

customer support

content creation

education

healthcare

research assistant

Similar Models

Gemma 3 27B

Gemini 3.1 Pro

Llama 4 Maverick

Qwen3.5