Evaluation record · kimi-k2-6

Kimi K2.6

v20260420

Moonshot AI

Modelcodingagenticopen-sourcemixture-of-experts

Strong

About This Model

Moonshot AI's open-weight 1T-parameter MoE (32B active) with vendor-reported 80.2% SWE-Bench Verified and 58.6 SWE-Bench Pro. Agent Swarm orchestration scales to 300 sub-agents and 4,000 coordinated steps for long-horizon coding. Remains Moonshot's general-purpose flagship as of July 2026; a coding-specialized sibling, Kimi K2.7-Code (built on K2.6, also open-weight Modified MIT), shipped 2026-06-12.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

Vendor-reported open-weight leadership on agentic coding (80.2% SWE-Bench Verified, 58.6 SWE-Bench Pro). Agent Swarm scales to 300 sub-agents / 4,000 coordinated steps. Most headline scores are vendor-reported and await independent replication.

task accuracy code

Vendor-reported industry-standard coding benchmarks; scores pending broad independent replication

Evidence

Kimi K2.6 Model Card (vendor-reported) — SWE-Bench Verified 80.2%, SWE-Bench Pro 58.6 (vs GPT-5.4's 57.7)

LiveCodeBench v6 (vendor-reported) — 89.6% on competitive programming tasks

MarkTechPost release coverage — Long-horizon coding focus; claims open-weight state of the art on agentic coding

mediumVerified: 2026-07-09

task accuracy reasoning

Vendor-reported tool-augmented reasoning benchmarks requiring multi-step problem solving

Evidence

Humanity's Last Exam with tools (vendor-reported) — 54.0 on HLE-with-tools, frontier-competitive

mediumVerified: 2026-07-09

task accuracy general

Review of vendor benchmark suite and community evaluations across knowledge domains

Evidence

Kimi K2.6 Model Card — Strong general performance across knowledge benchmarks; text and vision modalities

mediumVerified: 2026-07-09

output consistency

Community testing of repeated runs and long-horizon agent trajectories

Evidence

Community evaluation — Consistent agentic behavior over long trajectories; native INT4 quantization preserves quality

mediumVerified: 2026-07-09

latency p50

Median latency for API requests with standard prompt sizes; self-hosted latency depends on hardware

Evidence

Community benchmarking — Typical first-response time ~3s on first-party API; varies widely by host

lowVerified: 2026-07-09

latency p95

95th percentile response time across diverse workloads

Evidence

Community benchmarking — p95 ~7.5s; long agentic chains take substantially longer by design

lowVerified: 2026-07-09

context window

Official specification from model card

Evidence

Kimi K2.6 Model Card — 262,144 token context window

highVerified: 2026-07-09

uptime

Review of platform availability and self-hosting fallback options

Evidence

Moonshot AI Platform — First-party API generally stable; open weights allow self-hosted redundancy

mediumVerified: 2026-07-09

🛡️Security

Standard open-model security posture. No published third-party security audit; self-hosting shifts security responsibility to the deployer.

prompt injection resistance

Review of vendor safety documentation and community red-team reports against OWASP LLM01 patterns

Evidence

Kimi K2.6 Model Card — Safety tuning described; no published third-party prompt-injection audit

lowVerified: 2026-07-09

jailbreak resistance

Testing against adversarial prompt datasets; open-weight deployments inherit deployer responsibility

Evidence

Community red-teaming — Standard alignment tuning; open weights mean guardrails can be removed in fine-tuned derivatives

mediumVerified: 2026-07-09

data leakage prevention

Analysis of privacy policies and self-hosting data-control options

Evidence

Moonshot AI Privacy Policy — Standard data handling on first-party API; full control when self-hosted

mediumVerified: 2026-07-09

output safety

Safety testing across harmful content categories per vendor card and community reports

Evidence

Kimi K2.6 Model Card — Safety post-training applied; refusal behavior comparable to other open frontier models

mediumVerified: 2026-07-09

api security

Review of API security features and best practices

Evidence

Moonshot AI API Documentation — API key authentication, HTTPS only, rate limiting; OpenAI-compatible endpoints

mediumVerified: 2026-07-09

🔒Privacy & Compliance

First-party API operates under Chinese jurisdiction — a material caveat for Western regulated industries. Open weights fully mitigate this for organizations able to self-host or use Western inference providers.

data residency

Review of provider jurisdiction and third-party hosting options

Evidence

Moonshot AI Platform Documentation — Moonshot AI is a China-based provider; first-party API data processed under Chinese jurisdiction

OpenRouter availability — Available via OpenRouter and Western inference hosts, enabling non-China residency

mediumVerified: 2026-07-09

training data optout

Analysis of privacy policy and data usage terms

Evidence

Moonshot AI Privacy Policy — API data usage terms standard for the segment; self-hosting removes the question entirely

mediumVerified: 2026-07-09

data retention

Review of terms of service and deployment-dependent retention

Evidence

Moonshot AI Terms — First-party retention governed by Chinese data regulations; self-hosted deployments retain nothing externally

mediumVerified: 2026-07-09

pii handling

Review of data protection capabilities and customer responsibilities

Evidence

Moonshot AI Documentation — Customer responsible for PII redaction; no managed PII tooling

mediumVerified: 2026-07-09

compliance certifications

Verification of compliance certifications and audit reports

Evidence

Moonshot AI public materials — No published SOC 2 / HIPAA / GDPR attestations for the first-party API; Western hosts may carry their own certifications

mediumVerified: 2026-07-09

zero data retention

Review of self-hosting deployment options enabling zero retention

Evidence

Open weights on Hugging Face — Self-hosting (vLLM/SGLang, native INT4) gives complete data control and zero external retention

mediumVerified: 2026-07-09

👁️Trust & Transparency

Open weights and a detailed model card provide good architectural transparency; training data disclosure and independent benchmark verification remain limited.

explainability

Evaluation of reasoning and agent-trajectory transparency

Evidence

Agent Swarm architecture — Sub-agent trajectories and tool-call traces are inspectable, aiding auditability of long-horizon runs

mediumVerified: 2026-07-09

hallucination rate

Testing on factual QA datasets and tool-augmented workflows

Evidence

Community testing — Moderate hallucination rate; tool-use grounding improves factuality in agentic mode

mediumVerified: 2026-07-09

bias fairness

Review of published bias benchmarks and community evaluations

Evidence

Kimi K2.6 Model Card — Limited published bias evaluation

lowVerified: 2026-07-09

uncertainty quantification

Qualitative assessment of confidence expression in outputs

Evidence

Model behavior testing — Expresses uncertainty adequately; no calibrated confidence outputs

mediumVerified: 2026-07-09

model card quality

Review of documentation completeness and clarity

Evidence

Hugging Face model card — Detailed card: 1T total / 32B active MoE, 384 experts, MLA attention, native INT4, benchmarks, deployment guides

highVerified: 2026-07-09

training data transparency

Review of public disclosures about training data

Evidence

Moonshot AI publications — Architecture well documented; training data composition not disclosed in detail

mediumVerified: 2026-07-09

guardrails

Analysis of built-in safety mechanisms

Evidence

Kimi K2.6 Model Card — Built-in safety tuning; deployers of open weights must layer their own guardrails

mediumVerified: 2026-07-09

⚙️Operational Excellence

Strong open-model ecosystem presence. Modified MIT license is permissive for most users but the attribution clause above 100M MAU / $20M monthly revenue requires legal review at hyperscale.

api design quality

Review of API design, consistency, and feature completeness

Evidence

Moonshot AI API Documentation — OpenAI-compatible API with streaming, tool calling, vision; Agent Swarm orchestration endpoints

highVerified: 2026-07-09

sdk quality

Review of SDK quality, documentation, and maintenance

Evidence

Moonshot AI GitHub — OpenAI-compatible so mainstream SDKs work; first-party tooling thinner than Western providers

mediumVerified: 2026-07-09

versioning policy

Review of versioning practices and weight availability

Evidence

Kimi release history — K2.6 supersedes K2.5/K2; prior weights remain available, but cadence is fast

MarkTechPost - Kimi K2.7-Code release — Kimi K2.7-Code released 2026-06-12: coding-specialized open-weight model built on K2.6 (1T/32B active, 256K context, Modified MIT) with ~30% lower reasoning-token usage; vendor claims +21.8% on Kimi Code Bench v2 over K2.6 — all K2.7 benchmarks are Moonshot-proprietary with no independent public-suite results yet

mediumVerified: 2026-07-09

monitoring observability

Review of available monitoring tools and metrics

Evidence

Moonshot AI Platform — Basic usage dashboard; self-hosted observability is deployer-built

mediumVerified: 2026-07-09

support quality

Assessment of documentation, community, and support responsiveness

Evidence

Moonshot AI community channels — GitHub and community support; limited English-language enterprise support

mediumVerified: 2026-07-09

ecosystem maturity

Analysis of third-party hosting, integrations, and tooling

Evidence

OpenRouter and inference ecosystem — Available on OpenRouter and major open-model hosts; vLLM/SGLang support with native INT4

highVerified: 2026-07-09

license terms

Review of licensing terms and restrictions; attribution clause is trust-relevant for large-scale commercial use

Evidence

Modified MIT License — MIT with an attribution-UI requirement for deployments exceeding 100M MAU or $20M/month revenue

highVerified: 2026-07-09

Strengths

+Vendor-reported open-weight leadership in agentic coding (80.2% SWE-Bench Verified, 58.6 SWE-Bench Pro vs GPT-5.4's 57.7)
+Agent Swarm scales to 300 sub-agents and 4,000 coordinated steps for long-horizon tasks
+Open weights with near-MIT license enable full self-hosting and data control
+Efficient inference: 32B active of 1T total, MLA attention, native INT4 quantization
+262,144-token context with text and vision modalities
+Competitive API pricing (~$0.95/$4.00 per 1M tokens) and broad availability via OpenRouter

Limitations

!First-party Moonshot API processes data under Chinese jurisdiction with limited Western compliance certifications
!Headline benchmarks are vendor-reported and await independent replication
!Modified MIT license imposes attribution-UI requirement above 100M MAU or $20M/month revenue
!Self-hosting a 1T-parameter MoE requires substantial GPU infrastructure even at INT4
!Limited published bias, safety, and red-team evaluations
!English-language enterprise support is thin compared to Western providers

Metadata

pricing

input: $0.95 per 1M tokens ($0.16 cache hit)

output: $4.00 per 1M tokens

notes: First-party Moonshot API pricing confirmed July 2026; cached input drops to $0.16 per 1M (~83% off). Third-party hosts on OpenRouter vary (some cheaper, e.g. $0.55/$2.00). Self-hosting cost is infrastructure-dependent.

last verified: 2026-07-09

context window: 262144

languages

0: English

1: Chinese

2: Japanese

3: Korean

4: Spanish

5: French

6: German

modalities

0: text

1: image (input)

api endpoint: https://api.moonshot.ai/v1/chat/completions

open source: true

license: Modified MIT (attribution-UI requirement above 100M MAU or $20M/month revenue)

architecture: Mixture-of-Experts: 1T total / 32B active parameters, 384 experts, Multi-head Latent Attention (MLA), native INT4

parameters: 1T total / 32B active

release date: 2026-04-20

Use Case Ratings

code generation

Vendor-reported 80.2% SWE-Bench Verified and 58.6 SWE-Bench Pro; Agent Swarm excels at long-horizon multi-file engineering. For pure coding workloads, Moonshot's coding-specialized K2.7-Code (June 2026, built on K2.6) claims further gains with ~30% lower token usage.

customer support

Capable but not specialized; agentic latency unnecessary for simple support flows.

content creation

Solid long-form generation with large context; not its differentiator.

data analysis

Strong tool-augmented analysis; Agent Swarm parallelizes multi-source investigation well.

research assistant

54.0 HLE-with-tools and 262K context make it strong for deep, tool-driven research.

legal compliance

China-jurisdiction first-party API and absent Western certifications are blockers unless self-hosted.

healthcare

Not recommended via first-party API; self-hosted deployment in a compliant environment is the only viable path.

financial analysis

Strong quantitative and agentic capability; data residency requires self-hosting for regulated firms.

education

Strong STEM and coding tutoring at competitive pricing.

creative writing

Competent creative output; optimized for agentic engineering rather than prose.

Similar Models

GLM-5

Z.ai (Zhipu AI)

DeepSeek-V4

DeepSeek

MiniMax-M2

MiniMax

Claude Opus 4.8

Anthropic

GPT-5.5

OpenAI