Anthropic Claude Opus 4.6

ClosedMultimodal

Anthropic

Anthropic's most capable model with #1 Arena ELO, best-in-class agentic coding, 1M context window, and industry-leading alignment safety.

Overview

Claude Opus 4.6 is Anthropic's most capable model, released on February 4, 2026, and it currently holds the #1 position on the Chatbot Arena leaderboard with an ELO of 1503. Rather than a simple parameter scaling exercise, Opus 4.6 introduces adaptive thinking — a four-level system (low, medium, high, max) that lets the model dynamically allocate compute to match task complexity. This means straightforward questions get fast answers while hard reasoning problems trigger deeper chain-of-thought processing, all without separate model variants.

The model's 1M context window (currently in beta) is backed by a context compaction mechanism designed specifically for long-running agentic workflows. When an agent conversation grows beyond a threshold, the model compresses earlier context into a semantic summary, preserving critical information while freeing tokens for continued operation. This is why Opus 4.6 has become the go-to backbone for autonomous coding agents — it scored 81.42% on SWE-bench Verified, meaning it can resolve real GitHub issues across multi-file codebases with minimal human guidance.

Beyond coding, Opus 4.6 shows unusual strength in domains that demand both precision and nuance. It achieved 90.2% on BigLaw Bench (legal reasoning), 86.8% on BrowseComp with multi-agent setups (web research), and 76% on the notoriously difficult MRCR v2 8-needle 1M-context retrieval task. Its 128K output token limit is the largest among frontier models, enabling generation of complete documents, full codebases, or exhaustive analyses in a single response.

The pricing structure reflects its positioning as a premium workhorse: $5/$25 per million tokens for standard context, stepping up to $10/$37.50 for prompts exceeding 200K tokens. While not the cheapest option, the combination of top-tier reasoning, massive context, and reliable agentic behavior makes it the preferred choice for professional software engineering and complex analytical workloads.

Release Date

2026-02-04

Parameters

Unknown

Context Window

1.0M tokens

Input Price

$5 / 1M tokens

Output Price

$25 / 1M tokens

Speed

55 tokens/sec

Benchmarks

BenchmarkScoreMax
SWE-bench Verified81.4%100%
Arena ELO15032000
AA Index53%100%
GPQA Diamond91.3%100%
HLE53%100%
AIME 202693.3%100%

Capabilities

Agentic Coding

SWE-bench Verified 81.42%, resolves real GitHub issues across multi-file codebases autonomously

Adaptive Thinking

Four configurable levels (low/medium/high/max) that dynamically allocate reasoning compute per task

Long-Context Processing

1M token context window with context compaction for sustained agent sessions

Legal & Professional Reasoning

BigLaw Bench 90.2%, strong performance on complex domain-specific analysis

Web Research

BrowseComp 86.8% with multi-agent orchestration for deep information retrieval

Extended Generation

128K output token limit enables complete document and codebase generation in one pass

Getting Started

1

Get API Access

Sign up at console.anthropic.com and generate an API key — no waitlist for standard tier

2

Install the SDK

Run `pip install anthropic` (Python) or `npm install @anthropic-ai/sdk` (Node.js)

3

Make Your First Call

Use `client.messages.create(model="claude-opus-4-6-20260204")` with adaptive thinking enabled via the `thinking` parameter

4

Try It Live

Use claude.ai or the Claude Code CLI for immediate interactive access without writing code

Pros & Cons

Strengths

  • +#1 Arena ELO (1503)
  • +Best agentic coding (SWE-bench 81.4%)
  • +1M context window (beta)
  • +Excellent code review and debugging
  • +BigLaw Bench 90.2% (legal reasoning)
  • +Lowest misaligned behavior rate

Weaknesses

  • -Premium pricing ($5/$25 per MTok)
  • -Moderate speed (~55 tok/s)
  • -Closed source
  • -Uses most tokens for reasoning tasks (157M for eval)

Best For

Agentic coding and software engineeringLong-context analysis (legal, financial)Code review and debuggingCreative writing

Compare

Official Links