← 回總覽

Claude Opus 4.6 推出自适应推理与上下文压缩功能,助力长时运行智能体

📅 2026-03-12 18:01 Steef-Jan Wiggers 人工智能 6 分鐘 6980 字 評分: 78
Claude Opus 4.6 Anthropic 智能体工作流 上下文压缩 LLM 基准测试
📌 一句话摘要 Anthropic 发布 Claude Opus 4.6,具备自适应推理控制和上下文压缩功能,旨在优化长时运行 AI 智能体的性能与成本。 📝 详细摘要 本文详细介绍了 Claude Opus 4.6 的发布,强调了其从静态推理向动态编排的转变。核心架构更新包括细粒度的“努力程度控制”(从低到最高),允许开发者平衡延迟与推理深度;以及“上下文压缩”功能,通过自动总结对话历史来缓解“上下文腐化”。该模型拥有 1M token 的上下文窗口,并在智能体编程和多针检索基准测试中表现出显著提升。它已在各大主流云服务商上线,并引入了自主智能体团队和品牌感知 PowerPoint 集成

Recently, Anthropic released Claude Opus 4.6, marking a shift from static inference to dynamic orchestration in its flagship model. The update introduces adaptive thinking effort controls and context compaction, architectural features designed to address context degradation and overthinking issues in long-running agentic workflows.

Claude Opus 4.6 is now available across all major cloud platforms, including Microsoft Foundry, AWS Bedrock, and Google Cloud's Vertex AI.

Opus 4.6 replaces binary reasoning toggles with four granular effort controls: low, medium, high (default), and max. This allows developers to programmatically calibrate the model's internal chain-of-thought depth based on task complexity.

Anthropic notes in its announcement that:

> Opus 4.6 often thinks more deeply and more carefully revisits its reasoning before settling on an answer. This produces better results on harder problems, but can add cost and latency on simpler ones.

Moreover, the company recommends dialing effort down to medium for straightforward tasks to reduce latency and cost.

Thinking tokens are billed as output tokens at $25 per million. For agentic systems making dozens of API calls, managing these effort levels becomes a primary cost control mechanism.

While Opus 4.6 introduces a 1M token context window in beta, which is enough to process approximately 750,000 words, the more significant architectural update is context compaction. This feature addresses performance degradation as context windows fill, a phenomenon Anthropic calls "context rot."

When a conversation approaches the limit, the API automatically summarizes earlier portions and replaces them with a compressed state. On the MRCR v2 (Multi-needle Retrieval) benchmark at 1M tokens, Opus 4.6 achieved 76% accuracy, which is a fourfold improvement over Sonnet 4.5's 18.5%. Anthropic describes this as:

> A qualitative shift in how much context a model can actually use while maintaining peak performance.

The model also delivers a maximum output of 128K tokens, doubling the previous 64K limit.

Microsoft positions its service, Foundry, as an interoperable platform where intelligence and trust converge to enable autonomous work. In its blog post, Microsoft states that Opus 4.6 can leverage Foundry IQ to access data from Microsoft 365 Work IQ, Fabric IQ, and the web.

Furthermore, Microsoft describes the model as:

> Best applied to complex tasks across coding, knowledge work, and agent-driven workflows, supporting deeper reasoning while offering superior instruction following for reliability.

The company emphasizes Foundry's "managed infrastructure and operational controls" that allow teams to "compress development timelines from days into hours."

Opus 4.6 is also available through Microsoft Copilot Studio, Google Cloud's Vertex AI Agent Builder, and Amazon Bedrock Agents, enabling organizations to build and deploy AI agents without custom code.

The release includes Agent Teams in Claude Code as a research preview, allowing developers to spin up multiple agents that work in parallel and coordinate autonomously. Anthropic describes this as:

> Best for tasks that split into independent, read-heavy work like codebase reviews.

Furthermore, Claude's integration into PowerPoint, also in research preview, allows the model to read layouts, fonts, and slide masters to generate presentations that stay on brand. The feature is available for Max, Team, and Enterprise plans.

Anthropic also claims state-of-the-art results on multiple evaluations:

* Terminal-Bench 2.0 (agentic coding): 65.4% (highest score) * Humanity's Last Exam: Leads all frontier models * GDPval-AA (knowledge work): Outperforms OpenAI's GPT-5.2 by ~144 Elo points * BrowseComp: Best performance for locating hard-to-find information

!Image 1: The image displays a bar chart comparing different AI models' performance in various tasks, such as Agentic search, Coding, and Reasoning, with specific Elo scores for each model.AI-generated content may be incorrect.

_(Source: Athropic blog post)_

The model found over 500 previously unknown high-severity security vulnerabilities in open-source libraries, including Ghostscript, OpenSC, and CGIF. However, independent testing by Quesma revealed limitations: Claude Opus 4.6 detected backdoors in compiled binaries only 49% of the time when using open-source tools like Ghidra, with notable false positives. Hacker News discussion highlighted concerns about regression from Opus 4.5, with users reporting that the new model performs worse on certain tasks.

Base pricing remains $5 per million input tokens and $25 per million output tokens. However, a "long-context premium" of $10/$37.50 per million tokens applies to the entire request once input exceeds 200K tokens. The 1M context window is currently available in beta only through Claude's native API. US-only inference carries a 1.1x pricing multiplier.

Lastly, the model is accessible through claude.ai, the Claude API (model string: claude-opus-4-6), Microsoft Foundry, AWS Bedrock, Google Cloud Vertex AI, and GitHub Copilot for Pro, Business, and Enterprise users.

查看原文 → 發佈: 2026-03-12 18:01:00 收錄: 2026-03-12 20:00:42

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。