Evaluation record · mistral-large-3

Mistral Large 3

vLarge 3 (Mistral 3 family)

Mistral AI

Modelopen-sourceapache-2.0mixture-of-expertsmultilingual

Strong

About This Model

Mistral AI's open-weight flagship released December 2025 under Apache 2.0: a sparse MoE (675B total / 41B active) multimodal model with ~256K context and 40+ languages. Debuted #2 among open-source non-reasoning models on LMArena, with a strong EU data-sovereignty story.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Best-in-class open-weight performance for its release window: sparse MoE (675B total / 41B active) delivers near-frontier quality with modest active compute. Non-reasoning class — frontier reasoning models outperform it on hard multi-step problems.

task accuracy code

Review of provider benchmarks and community evaluations of open weights

Evidence

Mistral 3 Announcement — Strong coding performance reported across standard benchmarks for an open-weight model

mediumVerified: 2026-07-09

task accuracy reasoning

Review of reasoning benchmarks; model is non-reasoning class (no extended thinking)

Evidence

Mistral 3 Announcement — Competitive math and reasoning results among non-reasoning (single-pass) models

mediumVerified: 2026-07-09

task accuracy general

Crowdsourced arena comparisons and provider benchmark suite

Evidence

LMArena Leaderboard — Debuted #2 among open-source non-reasoning models on LMArena

Mistral 3 Announcement — State-of-the-art open-weight performance across knowledge and multilingual tasks

highVerified: 2026-07-09

output consistency

Community repeated-prompt testing on open weights

Evidence

Community Evaluations — Stable instruction following reported across hosted and self-hosted deployments

mediumVerified: 2026-07-09

latency p50

Median latency from third-party benchmarking of hosted endpoints

Evidence

Community benchmarking — 41B active parameters keep inference fast for the model's scale

mediumVerified: 2026-07-09

latency p95

95th percentile estimates across hosting providers

Evidence

Community benchmarking — Tail latency varies by host (Mistral, Bedrock, Azure, self-hosted)

lowVerified: 2026-07-09

context window

Official specification from provider

Evidence

Mistral 3 Announcement — Approximately 256K token context window

highVerified: 2026-07-09

uptime

Historical uptime of hosted API; open weights enable customer-controlled availability

Evidence

Mistral Status Page — Stable hosted-API availability; self-hosting removes provider dependency entirely

mediumVerified: 2026-07-09

🛡️Security

Solid security with the open-weights caveat: deployers control (and can remove) guardrails, so deployment-level controls matter more than for closed models.

prompt injection resistance

Testing against OWASP LLM01 prompt injection patterns

Evidence

Mistral Documentation — Instruction-hierarchy training; limited published red-team results

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing on hosted and open-weight deployments

Evidence

Community Red-Teaming — Reasonable default refusals; open weights allow guardrail removal by deployers

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies plus self-hosting option

Evidence

Mistral Privacy Policy — EU-based provider under GDPR; self-hosting keeps data entirely in customer infrastructure

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories

Evidence

Mistral Moderation Tools — Optional moderation API and system-prompt guardrailing available

mediumVerified: 2026-07-09

api security

Review of API security features across hosting options

Evidence

Mistral API Documentation — API key authentication, HTTPS only, rate limiting; cloud-provider controls on Bedrock/Azure

mediumVerified: 2026-07-09

🔒Privacy & Compliance

Standout data-sovereignty story: EU provider under GDPR, plus Apache 2.0 weights allow fully on-premises/air-gapped deployment — the strongest possible residency guarantee.

data residency

Review of hosting documentation and deployment options

Evidence

Mistral AI — EU provider with European hosting; open weights allow full on-premises deployment

highVerified: 2026-07-09

training data optout

Analysis of terms of service and data usage policy

Evidence

Mistral Terms and Privacy — API data not used for training by default for paid tiers; self-hosting eliminates the question entirely

mediumVerified: 2026-07-09

data retention

Review of retention policies across deployment modes

Evidence

Mistral Privacy Policy — Short-term retention for abuse monitoring on La Plateforme; customer-controlled when self-hosted

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities

Evidence

Mistral Documentation — Customer responsible for PII redaction; self-hosting keeps PII in-house

mediumVerified: 2026-07-09

compliance certifications

Verification of certifications across Mistral and cloud hosting partners

Evidence

Mistral Trust & GDPR Posture — EU provider natively under GDPR; SOC 2 for hosted platform; Bedrock/Azure hosting inherits those clouds' certifications

mediumVerified: 2026-07-09

zero data retention

Review of self-hosting options enabling complete data control

Evidence

Open Weights Distribution — Apache 2.0 weights enable fully air-gapped, zero-external-retention deployments

highVerified: 2026-07-09

👁️Trust & Transparency

Open weights provide architectural transparency rare at this scale (675B MoE disclosed), though training data detail and built-in guardrails are lighter than closed frontier models.

explainability

Evaluation of reasoning transparency; open weights enable interpretability research

Evidence

Model Behavior — Clear step-by-step explanations; open weights permit deep inspection and research

mediumVerified: 2026-07-09

hallucination rate

Factual QA testing by community evaluators

Evidence

Community Evaluations — Typical hallucination rates for its class; no built-in grounding

mediumVerified: 2026-07-09

bias fairness

Review of bias disclosures and multilingual evaluation

Evidence

Mistral Documentation — Multilingual training (40+ languages) reduces anglocentric bias; limited published bias evaluation

mediumVerified: 2026-07-09

uncertainty quantification

Qualitative assessment plus open-weight logprob access

Evidence

Model Behavior — Adequate uncertainty expression; raw logprobs accessible via open weights

lowVerified: 2026-07-09

model card quality

Review of published model card and architecture disclosure

Evidence

Hugging Face Model Card — Public model card with architecture details (675B MoE / 41B active), license, and usage guidance

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Mistral 3 Announcement — Architecture fully disclosed; training data composition described only at a high level

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

Mistral Guardrailing Documentation — Optional moderation and system-prompt guardrails; defaults lighter than closed flagships and removable by deployers

mediumVerified: 2026-07-09

⚙️Operational Excellence

Apache 2.0 licensing at frontier scale is the headline: no usage restrictions, no vendor lock-in, and availability across HF, Bedrock, Azure, and La Plateforme.

api design quality

Review of API design and feature completeness

Evidence

Mistral API Documentation — Clean OpenAI-compatible API with streaming, function calling, JSON mode, and vision

highVerified: 2026-07-09

sdk quality

Review of SDK and inference-stack support

Evidence

Mistral SDKs — Official Python and TypeScript SDKs; first-class vLLM and transformers support for self-hosting

mediumVerified: 2026-07-09

versioning policy

Review of versioning policy; open weights eliminate forced-retirement risk

Evidence

Mistral Model Documentation — Dated model versions with deprecation notices; open weights never disappear once downloaded

mediumVerified: 2026-07-09

monitoring observability

Review of monitoring tools across deployment modes

Evidence

Mistral La Plateforme — Usage dashboard on hosted platform; self-hosted observability is customer-built

mediumVerified: 2026-07-09

support quality

Assessment of documentation and support channels

Evidence

Mistral Support — Good documentation, enterprise support contracts, active community

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of distribution channels and third-party tooling

Evidence

Multi-platform Availability — Available on Hugging Face, Amazon Bedrock, Azure, and Mistral's La Plateforme; broad vLLM/community tooling

highVerified: 2026-07-09

license terms

Review of license terms

Evidence

Apache 2.0 License — Apache 2.0 open weights: unrestricted commercial use, modification, and redistribution

highVerified: 2026-07-09

Strengths

+Apache 2.0 open weights at frontier scale — full commercial freedom and no lock-in
+Sparse MoE efficiency: 675B total but only 41B active parameters per token
+Debuted #2 among open-source non-reasoning models on LMArena
+EU provider with strong GDPR/data-sovereignty posture; fully self-hostable
+Multimodal (text + image) with 40+ languages and ~256K context
+Broad availability: Hugging Face, Amazon Bedrock, Azure, La Plateforme

Limitations

!Non-reasoning class — trails frontier reasoning models on hard multi-step problems
!Self-hosting 675B weights requires substantial GPU infrastructure despite 41B active
!Hosted API pricing ($2/$6 per 1M on La Plateforme) is several times higher than early aggregator estimates suggested, though output remains cheaper than closed flagships
!Open weights let deployers strip guardrails, shifting safety burden downstream
!Training data composition only described at a high level

Metadata

pricing

input: $2.00 per 1M tokens

output: $6.00 per 1M tokens

notes: Official La Plateforme pricing per mistral.ai/pricing (verified 2026-07-09); corrects earlier aggregator-based estimate of ~$0.50/$1.50. Batch processing gets a 50% discount; cloud-host (Bedrock/Azure) rates vary. Self-hosting under Apache 2.0 incurs only infrastructure cost.

last verified: 2026-07-09

context window: 256000

languages

0: English

1: French

2: German

3: Spanish

4: Italian

5: Portuguese

6: Dutch

7: Polish

8: Japanese

9: Korean

10: Chinese

11: Arabic

12: Hindi

13: Russian

modalities

0: text

1: image (input)

api endpoint: https://api.mistral.ai/v1/chat/completions

open source: true

architecture: Sparse Mixture-of-Experts transformer: 675B total parameters, 41B active per token

parameters: 675B total / 41B active (disclosed)

release date: 2025-12-02 (Mistral 3 family)

Use Case Ratings

code generation

Strong open-weight coding; closed frontier flagships still lead on hard software engineering.

customer support

40+ languages, low active-parameter inference cost, and self-hosting make it excellent for global support.

content creation

Strong multilingual content generation; particularly good for European-language work.

data analysis

Capable analysis within 256K context; lacks extended-reasoning mode for hardest problems.

research assistant

Good synthesis over long documents; self-hosting suits sensitive research corpora.

legal compliance

EU provider under GDPR plus on-premises deployment is compelling for European legal workloads.

healthcare

Self-hosting keeps PHI fully in-house, sidestepping vendor BAA questions; validate clinical accuracy.

financial analysis

Solid quantitative work; data-sovereign deployment appeals to EU financial institutions.

education

Multilingual strength and low cost suit global education deployments.

creative writing

Good multilingual creative range; less distinctive than closed frontier flagships.

Similar Models

DeepSeek-V4

DeepSeek

GLM-5

Z.ai (Zhipu AI)

Claude Sonnet 4.6

Anthropic

Amazon Nova 2 Lite

Amazon (AWS)

Gemini 3.1 Pro

Google