Tech News

Claude Opus 4.7 Is Here: Best Coding Model, 3× Vision, Same Price

Anthropic's Claude Opus 4.7 is here, delivering massive gains in coding and vision at no extra cost. Discover the new features and benchmark results.

Claude Opus 4.7 Is Here: Best Coding Model, 3× Vision, Same Price

Struggling with AI models that stumble on complex coding tasks or lack the vision to handle your most demanding visual data? Anthropic has just released Claude Opus 4.7 to solve exactly that. This upgrade delivers their most powerful coding performance yet and triples your vision capabilities—all while keeping the price completely unchanged. It is available now on claude.ai, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

This is not an incremental update. SWE-bench Pro is up 10.9 percentage points. CursorBench is up 12 points. Vision resolution has tripled. And — a detail companies are happy to hear — the pricing has not changed.

SWE-bench Pro
64.3%
era 53,4% no Opus 4.6 (+10,9pp)
CursorBench
70%
era 58% (+12pp) — melhor coding do mercado
Visão
3.75MP
era 1,15MP — 3× mais resolução

What has actually changed

Opus 4.7 was built around three real problems that Opus 4.6 users were reporting: the model sometimes abandoned long tasks midway, sometimes delivered code that looked correct but failed review, and sometimes interpreted instructions more loosely than expected.

The three central bets of Opus 4.7 are directly aimed at these problems: persistence in long tasks, self-verification before reporting, and literal instruction following.

Benchmarks: where Opus 4.7 won and where it gave ground

Opus 4.6 vs Opus 4.7 — benchmarks principais ■ Opus 4.6 ■ Opus 4.7 SWE-bench Verified 87,6% (+6,8pp) SWE-bench Pro 64,3% (era 53,4% — +10,9pp) CursorBench 70% (era 58% — +12pp) GPQA Diamond 94,2% (+2,9pp) Finance Agent v1.1 64,4% (era 60,7% — melhor do mercado) BrowseComp 79,3% (era 83,7% — regressão) Barras laranjas = Opus 4.7 · Barras com borda vermelha = regressão vs 4.6
Opus 4.6 vs Opus 4.7 comparison on key benchmarks

Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

Claude Opus 4.7 leads

SWE-bench Pro: 64.3% vs 57.7% (GPT) and 54.2% (Gemini) SWE-bench Verified: 87.6% CursorBench: 70% — best coding in IDE on the market MCP-Atlas (tool use): 77.3% vs 68.1% (GPT) Finance Agent: 64.4% vs 59.7% (Gemini) GDPVal-AA knowledge work: Elo 1,753 vs 1,674 (GPT)

Where it loses or ties

BrowseComp: 79.3% vs 89.3% (GPT) and 85.9% (Gemini) GPQA Diamond: 94.2% — practically tied (GPT: 94.4%, Gemini: 94.3%) Terminal-Bench 2.0: 69.4% vs 75.1% (GPT) Humanity's Last Exam: 54.7% vs 58.7% (GPT) CyberGym: intentional — cyber capabilities were reduced during training

3× better vision — what this changes in practice

Opus 4.6 processed images at up to 1,568px on the long side (1.15 megapixels). Opus 4.7 goes up to 2,576px (3.75 megapixels) — more than 3× more pixels.

In practice: dense technical diagrams, IDE screenshots, high-resolution PDF documents, design mockups, and complex financial charts arrive with true fidelity — not interpolated. The CharXiv visual reasoning with tools benchmark jumped from 84.7% to 91.0%.

The new xhigh level — fine control between quality and cost

Opus 4.6 had four effort levels: low, medium, high, and max. Opus 4.7 introduces a new level between high and max:

low econômico medium balanceado high padrão Claude Code xhigh ✦ novo novo padrão Claude Code max máximo — caro
Effort level scale in Opus 4.7

xhigh is now the default for Claude Code across all plans. The logic is simple: if a task requires three attempts at high to get it right, one attempt at xhigh is usually cheaper in total — fewer retries, fewer tokens spent.

Task budgets, /ultrareview and cross-session memory

Three new features arriving with the model:

Task budgets (public beta): set a token ceiling for autonomous agents. The model sees the counter decreasing and prioritizes the work, finishing cleanly instead of cutting off abruptly. Activate via header task-budgets-2026-03-13 + parameter output_config.task_budget.

/ultrareview in Claude Code: new command that runs a dedicated review session, reads the entire diff, and flags what a careful human reviewer would detect. 3 free uses on Pro and Max plans at launch.

Cross-session memory: Opus 4.7 is better at using file-system-based memory. It keeps important notes between long work sessions, reducing the context you need to paste at the start of each new session.

Attention on the 4.6 migration

Anthropic called it a "direct upgrade" but there are changes that affect token usage and behavior:

The elephant in the room: the Mythos Preview

Anthropic was transparent: Opus 4.7 does not match the Claude Mythos Preview, their most powerful model — which is not publicly available due to safety concerns.

The Mythos Preview was released last week to a select group of technology and cybersecurity companies as part of Project Glasswing. Opus 4.7 is the first model where Anthropic tested safeguards against use in cyberattacks — what they learn here will guide how they eventually release Mythos-level models at scale.

Pricing, availability, and model ID

Price identical to Opus 4.6: $5 per million input tokens and $25 per million output tokens. Prompt caching reduces by up to 90%. Batch processing reduces by 50%.

Model ID in the API: claude-opus-4-7. Available on: claude.ai (Pro, Max, Team, Enterprise), Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

For most developers using Claude Code day-to-day, Opus 4.7 is a direct upgrade with no decision to make. Same price, better model.

For teams with agents in production, the migration requires attention: measure the impact of the new tokenizer, review prompts that relied on loose interpretation, and configure task budgets before turning on auto mode.