Evaluation record · deepseek-v4

DeepSeek-V4

v20260424-preview

DeepSeek

Modelopen-sourcemit-licensepreviewlong-context

Strong

About This Model

DeepSeek's preview flagship family: V4-Pro (1.6T total / 49B active MoE, largest open-weight release ever) and V4-Flash (284B/13B). 1M context, up to 384K output, via manifold-constrained Hyper Connections and Constrained Sparse Attention. MIT license. Vendor benchmarks await broad independent verification. Per DeepSeek (2026-06-30), V4 graduates to official release mid-July 2026 — same model names, with peak-hour API pricing (2x baseline, Beijing 9:00-12:00 and 14:00-18:00).

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Largest open-weight release ever (V4-Pro: 1.6T total / 49B active). Vendor benchmarks are impressive but this is a preview: score confidence is medium until independent verification matures. V4-Flash (284B/13B) offers a much cheaper deployment point.

task accuracy code

Vendor-reported coding benchmarks; medium confidence pending broad independent verification of preview-release claims

Evidence

DeepSeek API Pricing & Release Notes — Vendor reports state-of-the-art open-model coding results for V4-Pro, exceeding V3.2

Wikipedia: DeepSeek — V4 release (2026-04-24) documented as DeepSeek's strongest coding model; independent replication still in progress

mediumVerified: 2026-07-09

task accuracy reasoning

Vendor-reported reasoning benchmarks; medium confidence until third-party evaluations of the preview mature

Evidence

DeepSeek Release Notes — Vendor claims frontier reasoning performance, building on V3.2-Speciale's olympiad-level results

mediumVerified: 2026-07-09

task accuracy general

Community leaderboard positions and vendor benchmarks for a preview release

Evidence

Wikipedia: DeepSeek — Early community evaluations place V4-Pro at or near the top of open-model leaderboards

mediumVerified: 2026-07-09

output consistency

Repeated-prompt testing and community preview feedback

Evidence

DeepSeek API Documentation — Preview status: behavior may change before stable release; community reports occasional long-output instability near the 384K limit

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes from independent benchmarking

Evidence

Artificial Analysis — V4-Pro median latency ~2.2s; V4-Flash sub-second for standard prompts

mediumVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads from independent benchmarking

Evidence

Artificial Analysis — p95 ~5.5s for V4-Pro; long-context and long-output requests substantially longer

mediumVerified: 2026-07-09

context window

Official specification from provider

Evidence

DeepSeek API Documentation — 1M-token context window with up to 384K output tokens, enabled by Constrained Sparse Attention

highVerified: 2026-07-09

uptime

Status-page history since the 2026-04-24 launch

Evidence

DeepSeek Status — Post-launch demand spikes caused intermittent degradation in late April/May 2026; stabilizing since

mediumVerified: 2026-07-09

🛡️Security

Security posture mirrors V3.2 but with thinner preview-stage red-team coverage. Open weights shift safety responsibility to deployers who fine-tune or self-host.

prompt injection resistance

Testing against OWASP LLM01 prompt injection patterns; limited preview-stage coverage

Evidence

Early community red-team reports — Comparable to V3.2 on common injection patterns; preview red-teaming coverage still thin

mediumVerified: 2026-07-09

jailbreak resistance

Adversarial prompt testing; assessment accounts for open-weight modifiability

Evidence

DeepSeek Release Documentation — Standard alignment guardrails; open weights mean alignment is removable downstream

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policy plus self-hosting option for full data isolation

Evidence

DeepSeek Privacy Policy — Standard first-party API data handling; self-hosting gives complete data control

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories on default weights

Evidence

DeepSeek Release Documentation — Safety post-training applied; preview-stage safety evaluation less complete than for stable releases

mediumVerified: 2026-07-09

api security

Review of API security features and transport guarantees

Evidence

DeepSeek API Documentation — API key authentication, HTTPS-only, rate limiting on the first-party platform

highVerified: 2026-07-09

🔒Privacy & Compliance

Same split as all DeepSeek releases: the first-party API is China-hosted with China-jurisdiction residency and no Western certifications, while self-hosting or Western third-party hosting avoids those concerns. V4-Pro's 1.6T size makes self-hosting far harder than V4-Flash.

data residency

Review of privacy policy and hosting options; China-jurisdiction caveat applies only to the first-party API

Evidence

DeepSeek Privacy Policy — First-party API data processed and stored in China; MIT-licensed weights allow deployment in any jurisdiction

highVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms for the hosted API

Evidence

DeepSeek Privacy Policy — API data usage terms documented; self-hosting removes the concern entirely

mediumVerified: 2026-07-09

data retention

Review of terms of service; retention is deployment-dependent for open-weight models

Evidence

DeepSeek Terms of Service — First-party API retention follows Chinese regulatory requirements; self-hosted deployments retain nothing externally

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

DeepSeek Platform Documentation — No built-in PII redaction tooling; customer responsible on any deployment

mediumVerified: 2026-07-09

compliance certifications

Verification of certifications for the first-party platform; third-party hosts inherit their own certifications

Evidence

DeepSeek Platform — No SOC 2, HIPAA, or FedRAMP on the first-party API; compliant deployments achievable via certified Western hosts or self-hosting

mediumVerified: 2026-07-09

zero data retention

Review of data handling across first-party API, third-party hosts, and self-hosting

Evidence

Open-weight deployment options — No zero-retention option on the first-party API; self-hosting provides true zero external retention

mediumVerified: 2026-07-09

👁️Trust & Transparency

Architectural novelty (manifold-constrained Hyper Connections, Constrained Sparse Attention) is disclosed, but the preview lacks the full technical report and independent benchmark replication DeepSeek usually delivers. Treat vendor claims with medium confidence.

explainability

Evaluation of reasoning transparency and trace accessibility

Evidence

DeepSeek API Documentation — Visible reasoning traces; up to 384K output supports very long inspectable chains of thought

mediumVerified: 2026-07-09

hallucination rate

Limited factual QA testing during the preview period

Evidence

Early community testing — Preliminary results suggest parity with or improvement over V3.2; sample sizes still small for the preview

lowVerified: 2026-07-09

bias fairness

Preliminary bias probing; formal evaluations pending

Evidence

Early bias probes — Topic-avoidance on politically sensitive subjects persists from prior releases; formal bias audits of V4 not yet published

lowVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model behavior assessment — Expresses uncertainty in reasoning traces; final-answer calibration unverified for the preview

mediumVerified: 2026-07-09

model card quality

Review of preview documentation completeness against DeepSeek's historical technical-report standard

Evidence

DeepSeek Release Documentation — Release notes document architecture (Hyper Connections, Constrained Sparse Attention) and pricing; full technical report expected with the stable release

mediumVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

DeepSeek Public Disclosures — High-level methodology described; dataset composition not disclosed in detail

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms in default weights

Evidence

DeepSeek Safety Documentation — Standard alignment guardrails in released weights; preview safety evaluation ongoing

mediumVerified: 2026-07-09

⚙️Operational Excellence

Aggressive pricing (V4-Pro $0.435/$0.87, V4-Flash $0.14/$0.28 per 1M) and MIT licensing, but the short legacy-endpoint deprecation window (2026-07-24) and preview status demand migration agility. V4-Pro self-hosting is feasible only for well-resourced organizations.

api design quality

Review of API design, consistency, and feature completeness

Evidence

DeepSeek API Documentation — OpenAI-compatible API; V4-Pro and V4-Flash endpoints with context caching ($0.0028/1M cache hits on Flash)

highVerified: 2026-07-09

sdk quality

Review of SDK compatibility and inference-framework support

Evidence

DeepSeek GitHub — OpenAI-SDK compatibility; vLLM and SGLang support landed for V4 architecture within weeks of release

highVerified: 2026-07-09

versioning policy

Review of deprecation timelines and migration windows

Evidence

DeepSeek API Pricing Page — Legacy deepseek-chat/deepseek-reasoner endpoints deprecate 2026-07-24, a three-month migration window

TechNode — Official V4 release planned for mid-July 2026 (graduation from preview, same model names) with peak-hour pricing at 2x baseline during Beijing 9:00-12:00 and 14:00-18:00; users get 24-hour email notice before billing changes

highVerified: 2026-07-09

monitoring observability

Review of monitoring tools across deployment options

Evidence

DeepSeek Platform — Usage dashboard with token metrics; full observability when self-hosting

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

DeepSeek Support Channels — Community and email support only; no enterprise SLA; preview status adds change risk

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of third-party hosting availability six weeks post-release

Evidence

Hosting ecosystem — Third-party hosts onboarding rapidly; V4-Pro's 1.6T footprint limits the number of providers able to serve it, while V4-Flash adoption is broad

mediumVerified: 2026-07-09

license terms

Review of licensing terms and restrictions

Evidence

DeepSeek-V4 License — MIT license: unrestricted commercial use, modification, and redistribution

highVerified: 2026-07-09

Strengths

+Largest open-weight release ever: V4-Pro at 1.6T total / 49B active parameters under MIT license
+1M-token context window with up to 384K output tokens
+Novel architecture: manifold-constrained Hyper Connections and Constrained Sparse Attention
+Aggressive pricing: V4-Pro $0.435/$0.87 and V4-Flash $0.14/$0.28 per 1M tokens ($0.0028 cache hits on Flash)
+V4-Flash (284B/13B) offers a practical self-hosting and high-volume deployment point
+Inherits DeepSeek's frontier reasoning lineage from V3.2/Speciale

Limitations

!Preview status until mid-July 2026 official release; vendor benchmark claims not yet broadly independently verified and full technical report not yet published
!Peak-hour API pricing (2x baseline, Beijing 9:00-12:00 and 14:00-18:00) takes effect at the official release, complicating cost planning for workloads in those windows
!First-party API is China-hosted: China-jurisdiction data residency and no SOC 2/HIPAA/FedRAMP (self-hosting or Western hosts avoid this)
!V4-Pro's 1.6T footprint makes self-hosting impractical for all but the largest organizations
!Short migration window: legacy deepseek-chat/deepseek-reasoner endpoints deprecate 2026-07-24
!Text-only: no native vision or audio
!No enterprise SLA or dedicated support on the first-party platform

Metadata

pricing

input: $0.435 per 1M tokens (V4-Pro); $0.14 per 1M (V4-Flash, $0.0028 cache hit)

output: $0.87 per 1M tokens (V4-Pro); $0.28 per 1M (V4-Flash)

notes: Preview pricing per official pricing page; legacy deepseek-chat/deepseek-reasoner endpoints deprecate 2026-07-24. At the mid-July 2026 official release, peak-hour pricing takes effect: 2x baseline during Beijing 9:00-12:00 and 14:00-18:00 (off-peak rates unchanged). Self-hosting is infrastructure-cost-only under MIT license.

last verified: 2026-07-09

context window: 1000000

max output: 384000

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

7: Portuguese

8: Russian

9: Arabic

modalities

0: text

api endpoint: https://api.deepseek.com/v1/chat/completions

open source: true

architecture: Mixture-of-Experts with manifold-constrained Hyper Connections and Constrained Sparse Attention; V4-Pro 1.6T total / 49B active, V4-Flash 284B total / 13B active

parameters: V4-Pro: 1.6T total / 49B active; V4-Flash: 284B total / 13B active

knowledge cutoff: Early 2026

Use Case Ratings

code generation

Vendor-reported state-of-the-art open-model coding; 1M context fits entire large repositories. Preview status warrants validation on your own tasks.

customer support

V4-Flash is a strong cheap option for high-volume support with aggressive cache-hit pricing.

content creation

Up to 384K output enables book-length single-pass drafts; prose quality solid but not best-in-class.

data analysis

1M context plus strong reasoning makes whole-dataset and multi-document analysis practical at open-model prices.

research assistant

1M-token context ingests entire literature corpora; frontier reasoning lineage from V3.2-Speciale.

legal compliance

China-hosted first-party API and preview status are both disqualifying for most regulated legal work; self-hosting V4-Flash is the viable path.

healthcare

No HIPAA path on the first-party API; preview status adds change risk. Only self-hosted compliant deployments are viable.

financial analysis

Excellent quantitative reasoning over very long filings at low cost; verify vendor claims and plan data residency for regulated use.

education

V4-Flash pricing is ideal for education-scale deployment with strong math reasoning.

creative writing

Massive output length helps novel-scale drafting; stylistic range remains behind dedicated creative leaders.

Similar Models

DeepSeek-V3.2

DeepSeek

DeepSeek-R1

DeepSeek

Qwen3.5

Alibaba

Claude Opus 4.5

Anthropic

Kimi K2.6

Moonshot AI