Evaluation record · claude-code

Claude Code

v2.1.x (2.1.204 as of 2026-07-08)

Anthropic

Agentcoding-agentclianthropicsandboxed

Strong

About This Agent

Anthropic's agentic coding tool available as a terminal CLI, IDE extensions, web, and desktop app. Plans and executes multi-step coding tasks with tiered permissions, OS-level sandboxing, MCP integration, hooks, subagents, and plugins/skills.

Last Evaluated: July 9, 2026

Official Website

Trust Vector Analysis

Dimension Breakdown

🚀Performance & Reliability

task completion accuracy

Benchmark results review plus adoption and revenue signals as proxy for sustained task success in production use

Evidence

Anthropic Claude Code GA announcement — GA v1.0 launched alongside Claude 4 models with state-of-the-art SWE-bench coding performance

Reported revenue traction — ~$1B annualized revenue within roughly six months of GA indicates strong real-world task success

highVerified: 2026-07-09

tool use reliability

Hands-on testing of built-in tools and MCP integrations across coding workflows

Evidence

Claude Code documentation — Mature built-in tool suite (file edit, bash, search) plus MCP servers, hooks, and plugins with permission gating

highVerified: 2026-07-09

multi step planning

Evaluation of plan mode and long-horizon task execution on multi-file repository changes

Evidence

Claude Code documentation — Plan mode, extended thinking, and task tracking support long-horizon multi-file refactors and feature builds

highVerified: 2026-07-09

memory persistence

Review of memory file hierarchy, context compaction behavior, and cross-session resume

Evidence

Claude Code memory documentation — CLAUDE.md project/user memory files, auto-compaction of long sessions, and session resume support persistence

highVerified: 2026-07-09

error recovery

Observed recovery behavior from failing builds, tests, and tool errors during evaluation sessions

Evidence

Claude Code documentation — Agent loop self-corrects from failed commands and test failures; checkpoints allow rewinding changes

mediumVerified: 2026-07-09

agent collaboration

Testing of subagent delegation, parallel task fan-out, and plugin-defined agents

Evidence

Claude Code subagents documentation — Native subagents with isolated contexts, custom system prompts, and tool restrictions enable parallel delegation

highVerified: 2026-07-09

🛡️Security

tool sandboxing

Review of sandbox architecture (filesystem and network isolation), managed cloud sandbox design, and 2026 CVE history; score reduced modestly (88 to 84) to reflect two max-severity sandbox escapes, balanced by fast patch turnaround

Evidence

Anthropic engineering: Claude Code sandboxing — OS-level sandboxing with filesystem and network isolation shipped Oct-Nov 2025, reducing permission prompts ~84%

Claude Code on the web — Web version executes tasks inside Anthropic-managed isolated sandboxes

SentinelOne vulnerability database - CVE-2026-39861 — Two CVSS 10.0 sandbox-escape CVEs disclosed and patched in 2026 (CVE-2026-39861 symlink escape; CVE-2026-25725 settings.json protection bypass); ~28 CVEs total in the product's first year, with rapid vendor patching

highVerified: 2026-07-09

access control

Assessment of permission model, allowlist granularity, and enterprise policy controls

Evidence

Claude Code permissions documentation — Tiered permission prompts with allow/deny rules, per-tool allowlists, and enterprise managed policy settings

highVerified: 2026-07-09

prompt injection defense

Review of documented mitigations and behavior when processing untrusted repository and web content

Evidence

Claude Code security documentation — Permission prompts for write actions, sandbox network isolation, and injection-aware system design mitigate untrusted content risks

mediumVerified: 2026-07-09

data isolation

Architecture review of session isolation in cloud sandboxes and local filesystem scoping

Evidence

Claude Code on the web — Cloud sessions run in per-task isolated environments; local sandbox restricts filesystem scope to the project

mediumVerified: 2026-07-09

open source transparency

License and source availability review

Evidence

Claude Code GitHub repository — Proprietary product; public repo hosts releases, issues, and documentation but core source is not open

highVerified: 2026-07-09

🔒Privacy & Compliance

data retention

Review of Anthropic data retention commitments across consumer and commercial tiers

Evidence

Anthropic privacy policy — Commercial/API usage not used for training by default; retention controls available for enterprise plans

mediumVerified: 2026-07-09

gdpr compliance

Compliance certification and DPA availability review

Evidence

Anthropic Trust Center — SOC 2 Type II certified with GDPR-aligned DPA available for commercial customers

mediumVerified: 2026-07-09

third party data sharing

Data flow analysis of code, prompt, and telemetry handling

Evidence

Anthropic privacy policy — Code and prompts processed by Anthropic only; no third-party model providers in the loop

mediumVerified: 2026-07-09

local deployment option

Deployment options assessment including Bedrock/Vertex routing and air-gap feasibility

Evidence

Claude Code documentation — Client runs locally and can route via Bedrock or Vertex AI, but requires Claude models in the cloud; no fully local model option

highVerified: 2026-07-09

👁️Trust & Transparency

documentation quality

Documentation completeness and accuracy review

Evidence

Claude Code documentation — Extensive docs covering permissions, sandboxing, MCP, hooks, subagents, plugins, and enterprise deployment

highVerified: 2026-07-09

execution traceability

Review of session transcripts, hooks-based auditing, and OTel telemetry support

Evidence

Claude Code documentation — Full transcript of every tool call and edit visible in session; OpenTelemetry metrics and logging supported

highVerified: 2026-07-09

decision explainability

Assessment of plan previews, inline reasoning, and diff-based change explanation

Evidence

Claude Code documentation — Visible reasoning, plan mode previews, and per-action permission prompts explain intended changes before execution

mediumVerified: 2026-07-09

open source code

Open source assessment of core product and ecosystem components

Evidence

Claude Code GitHub repository — Proprietary; public repository used for releases and issue tracking, not source code

highVerified: 2026-07-09

community activity

Community engagement analysis via GitHub activity, release cadence, and ecosystem growth

Evidence

Claude Code changelog — v2.1.204 released 2026-07-08; sustained near-daily release cadence, highly active issue tracker, and a large plugin/skills ecosystem

highVerified: 2026-07-09

⚙️Operational Excellence

ease of integration

Setup time and integration surface assessment across CLI, IDE, web, and desktop

Evidence

Claude Code documentation — Single npm install for CLI; native VS Code/JetBrains extensions, web, and desktop apps with shared auth

highVerified: 2026-07-09

scalability

Assessment of parallel cloud sessions, headless/CI usage, and rate limit behavior

Evidence

Claude Code on the web — Cloud sandboxes allow many parallel tasks; headless mode and GitHub Actions support CI-scale automation

mediumVerified: 2026-07-09

cost predictability

Pricing model analysis comparing subscription caps versus variable API token costs

Evidence

Claude pricing — Claude Pro/Max subscriptions cap monthly spend; API pay-as-you-go usage varies significantly with task size

highVerified: 2026-07-09

monitoring capabilities

Review of OTel export, cost/usage tracking, and admin analytics features

Evidence

Claude Code monitoring documentation — Built-in OpenTelemetry metrics, usage tracking, and enterprise analytics dashboard

highVerified: 2026-07-09

production readiness

Maturity assessment from GA timeline, release stability, and enterprise adoption

Evidence

Claude Code GA and adoption — GA since 2025-05-22 with ~$1B ARR within ~6 months; widely deployed in enterprises

highVerified: 2026-07-09

Strengths

+State-of-the-art coding capability backed by Claude models
+OS-level sandboxing with filesystem and network isolation cut permission prompts ~84%
+Tiered permission system with allowlists, hooks, and enterprise policies
+Rich extensibility: MCP servers, hooks, subagents, plugins, and skills
+Available across terminal, IDE extensions, web, and desktop with shared workflows
+Strong observability via full transcripts and OpenTelemetry

Limitations

!Proprietary, closed-source core despite public releases repository
!Locked to Claude models; no local or third-party model support
!API pay-as-you-go costs can spike on large autonomous tasks
!Subscription rate limits can interrupt heavy daily usage
!Autonomous edits still require human review for correctness and security
!Notable 2026 CVE history (~28 CVEs in first year, including two patched CVSS 10.0 sandbox escapes: CVE-2026-39861, CVE-2026-25725) and an accidental full source-map leak in npm v2.1.88 (2026-03-31); patches shipped quickly but the attack surface is large

Metadata

license: Proprietary (public releases repo at github.com/anthropics/claude-code)

supported models

0: Claude Opus

1: Claude Sonnet

2: Claude Haiku

programming languages

0: Language-agnostic (any language in the repository)

deployment type: Local CLI / IDE extensions / managed web sandboxes / desktop app

tool support

0: Built-in file, bash, and search tools

1: MCP servers

2: Hooks

3: Subagents

4: Plugins and skills

first release: 2025-02-24 (research preview); GA v1.0 2025-05-22

pricing: Claude Pro/Max subscription or API pay-as-you-go; ~$1B ARR within ~6 months of GA

interfaces

0: Terminal CLI

1: VS Code and JetBrains extensions

2: Web

3: Desktop

Use Case Ratings

code generation

Flagship use case; excels at multi-file feature work, refactors, debugging, and test-driven workflows

data analysis

Strong for scripted analysis, notebooks, and data pipeline work via bash and file tools

research assistant

Capable codebase and web research via agentic search, though optimized for engineering contexts

content creation

Good for technical writing and docs generation; not designed for general marketing content