← 回總覽

Cursor 自研新模型反超 Opus 4.6,价格还“打一折”!网友实测:只有它写完应用能一次跑通

📅 2026-03-20 15:48 AI前线 人工智能 15 分鐘 18194 字 評分: 78
Cursor Composer 2.0 AI 编程 大语言模型 LLM 基准测试
📌 一句话摘要 Cursor 发布自研编程大模型 Composer 2.0,在性能上反超 Claude Opus 4.6 且价格仅为其十分之一,旨在通过模型自研和 Agent 化转型应对大模型厂商的直接竞争。 📝 详细摘要 文章报道了 AI 编程工具 Cursor 发布其第二代自研模型 Composer 2.0 的消息。该模型在 Terminal-Bench 2.0 等基准测试中表现优异,超越了 Claude Opus 4.6,且在实际应用生成测试中展现出更高的成功率、更快的速度和极低的成本(仅为对手的 10%)。文章深入分析了 Cursor 此时推出自研模型的战略背景:面对 Claude
Skip to main content ![Image 7: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Cursor 自研新模型反超 Opus 4.6,价格还“打一折”!网友实测:只有它写完应用能一次跑通

A AI前线 @AI前线

One Sentence Summary

Cursor has released its in-house programming model, Composer 2.0, which outperforms Claude Opus 4.6 while costing only one-tenth as much, aiming to counter direct competition from model providers through in-house development and an Agent-centric transformation.

Summary

This article reports on the release of Composer 2.0, the second generation of Cursor's in-house model. The model performs exceptionally well in benchmarks like Terminal-Bench 2.0, surpassing Claude Opus 4.6, and demonstrates higher success rates, faster speeds, and significantly lower costs (only 10% of competitors) in real-world application generation tests. The article analyzes the strategic context behind Cursor's move: facing threats from CLI agents like Claude Code, Cursor is attempting to solidify its moat in AI programming by developing in-house models (fine-tuned from open-source models), shifting fully to an Agent-based collaboration model, and deepening its focus on the enterprise market, all to prevent being relegated from a "super entry point" to a mere "middle layer."

Main Points

* 1. Cursor releases Composer 2.0, with performance surpassing top-tier models in specific programming benchmarks.Surpassing Claude Opus 4.6 in tests like Terminal-Bench 2.0 marks Cursor's transition from a mere IDE shell to a platform with core model capabilities. * 2. Extreme cost-effectiveness, significantly lowering the barrier to entry for AI-assisted programming.Priced at only one-tenth of Opus 4.6, it achieves a leap in commercial competitiveness by optimizing input/output costs while maintaining performance. * 3. Cursor faces an 'IDE obsolescence' survival crisis, forcing a strategic transformation.As model providers directly launch CLI tools and Agents, the status of code editors as the primary entry point is being challenged, forcing Cursor to save itself through in-house model development and Agent-based transformation. * 4. Leveraging the open-source model ecosystem to build differentiated competitiveness.Cursor cleverly utilizes open-source models like DeepSeek and Qwen for secondary training and reinforcement learning, achieving a low-cost, high-performance specialized programming model.

Metadata

AI Score

78

Website mp.weixin.qq.com

Published At Today

Length 2471 words (about 10 min)

Sign in to use highlight and note-taking features for a better reading experience. Sign in now

AI前线 2026-03-20 15:48 北京

!Image 8

Cursor与“IDE失效”拼命赛跑

!Image 9

作者 | 木子

站在悬崖边的 Cursor,刚刚发布了自家第二代编程大模型:Composer 2.0, 且已在 IDE 中上线。

!Image 10

在一项关键的编程基准测试(Terminal-Bench 2.0)上,Composer 2 竟然反超了 Claude 的旗舰模型 Opus 4.6。

!Image 11

要知道,在 Cursor 拥有自家编程模型 Composer 之前,它长期“外挂”Claude 和 Codex,虽然因此吸了一大波粉,但也饱受质疑有没有核心能力。

而这一次,不仅性能反超,而且价格还“打一折”!

Cursor 给出的定价是:Fast 版本,每百万输入 token 输入 1.5 美元,每百万输入 token 输出 7.5 美元,比上一代便宜了 57% 左右。

而普通版的价格直接干到了输入 0.5 美元、输出 2.5 美元。相比之下,Claude Opus 4.6 的定价是:输入 5 美元、输出 25 美元——刚好差了整整 10 倍!不过需要说明的是,Anthropic 也指出,在使用缓存与批处理等优化机制时,原则上能把成本最多压到原来的十分之一。

当下, AI 竞争已经卷到了“谁能用更少的钱吐出更多 token”这步,而 Composer 2.0 在速度和成本这两端,竟然同时碾压了 Opus 4.6、GPT-5.4 这两个老对手。Cursor 也是毫不客气地放一张图,把三者的数据对比直接摆上台面。

!Image 12 网友实测:只有 Composer 2 写完应用能一次跑通

Cursor 宣称,Composer 2 在他们用上的所有基准测试中都取得了大幅提升。

除了前文提到的 Terminal-Bench 2.0,在衡量模型 Debug 能力的 SWE-bench Multilingual(多语言版)上,Composer 2 也给出了一个很能打的成绩:73.7%,而 Claude Opus 4.6 的这项得分是 77.83%(数据来自 Anthropic),可见两者已经拉得很近。

!Image 13

只在“通用榜单”里比高低可能已经不能满足 Cursor 了,他们最近还自建了一套基准,专门评估 agent 在真实任务执行水平,名为Cursor Bench

值得一提的是,Cursor Bench 还曾把在 SWE-Bench 上风光无限的 Claude Sonnet 4.5 直接打回原形:得分从 77.2 骤降到 37.9。至于 Composer 2,大概率已经在这套自家的“魔鬼基准”下被反复检验过了。

!Image 14

话说回来,在数据上的表现确实很亮眼了,那 Composer 2 的真实“业务水平”如何?

一位开发者网友对 Composer 2、Opus 4.6 和 GPT-5.4 在同一任务下做了波实测:

他用一套指定技术栈生成了一个 X 的克隆应用,并允许这三个模型调用浏览器自行测试。

结果显示,三者在规划阶段差别不大,都花了约 5 分钟;但到了真正执行时,差距开始拉开:Composer 2 生成的应用可以直接运行,而 Opus 和 GPT 虽然最终也能完成,但都卡在了 CORS 问题上,需要额外调试。

更有意思的是,三者生成的代码结构和质量其实非常接近,差距主要在于效率和成本:Composer 2 用时 5 分钟、花费 6.04 美元;而 Opus 和 GPT 分别耗时 19 分钟、22 分钟,成本也更高,达到 10.43 美元和 14.15 美元。

!Image 15

为什么说 Cursor 站在悬崖边?

这当然不是因为它赚不到钱。

恰恰相反,过去一年 Cursor 的营收、估值、用户增长都很猛,企业客户也还在持续买单。

据彭博社 3 月初消息,Cursor 在 2025 年销售额,从一年前的 1.5 亿美元飙升至 20 亿美元(约合人民币 138 亿元)。而且他们的员工数只有 300 多人。

另外,Cursor 在去年 11 月完成了上一轮融资,金额为 23 亿美元,投后估值 300 亿美元左右(约合人民币 2069 亿元)。而且据彭博社 3 月 11 日消息,Cursor 还在和投资者洽谈新一轮融资,投后估值或达 500 亿美元(约合人民币 3448 亿元)。不过谈判还在进行中,最终不一定能达成融资协议。

也就是说,真正危险的,是 Cursor 赖以崛起的那套逻辑正在被掏空:

过去开发者需要 IDE 来和 AI 一起写代码,而现在,越来越多开发者开始直接把任务交给 Claude Code、Codex 这类 CLI 智能体,让它们自己写、自己跑、自己改。

软件开发正在从“辅助写代码”切换到“智能体完成任务”,代码编辑器不再是唯一入口,甚至开始显得多余

这对 Cursor 来说是致命的。它原本最强的地方,是把 Claude、Codex 这些顶级模型装进一个足够顺手的 IDE 里;但当模型厂商自己下场做产品,直接把入口拿走,Cursor 就很容易从“超级入口”滑落成“中间一层”。

更尴尬的是,它长期依赖外部模型,用户喜欢它,恰恰也是因为它接入了最强的大脑;可一旦这些大脑自己做 IDE、做 CLI、做 Agent,Cursor 的护城河就开始变浅——上游模型厂往下吃,下游开发者往外绕,它被夹在中间。

所以 Cursor 的自救方式也很明确:

第一,补上最致命的短板,做自己的模型。

第二,全面转向 Agent,把 IDE 从“文件中心”改成“任务中心”。上线云端多智能体协作,让多个 Agent 并行干活,而不是只做一个代码补全工具。

第三,继续押企业市场,因为大公司迁移慢、合同长、合规重,不会今天用 Cursor 明天就全员切到 Claude Code。

另外,它还要降低对 Anthropic 和 OpenAI 的依赖。Cursor 利用 DeepSeek、Kimi、Qwen 等开源模型做了二次训练,再通过自有数据和强化学习,把它们拧成更便宜、更快的专用编码模型——Cursor 也是搭上中国开源模型的快车了。

说白了,Cursor 现在不只是在做版本更新,而是在抢时间重写自己的存在理由:

在“编辑器可能失去中心地位”的时代,证明自己不只是一个好用的壳,而是一个真正有模型、有系统、有新入口的 AI 编程平台。

参考链接: https://cursor.com/cn/blog/composer-2 https://x.com/TukiFromKL/status/2034677859818610700 https://x.com/wesbos/status/2034705631773372853

声明:本文为 AI 前线整理,不代表平台观点,未经许可禁止转载。

!Image 16

会议推荐

OpenClaw 出圈,“养虾”潮狂热,开年 Agentic AI 这把火烧得不可谓不旺。在这一热潮下,自托管 Agent 形态迅速普及:多入口对话、持久记忆、Skills 工具链带来强大生产力。但这背后也暴露了工程化落地的真实难题——权限边界与隔离运行、Skills 供应链安全、可观测与可追溯、记忆分层与跨场景污染、以及如何把 Agent 纳入团队研发 / 运维流程并形成稳定收益。

针对这一系列挑战,在 4 月 16-18 日即将举办的 QCon 北京站上,我们特别策划了「OpenClaw 生态实践」专题,将聚焦一线实践与踩坑复盘,分享企业如何构建私有 Skills、制定安全护栏、搭建审计与回放机制、建立质量 / 效率指标体系,最终把自托管 Agent 从可用的 Demo 升级为可靠的生产系统。

!Image 17: 图片

##### 今日荐文 一个不打算安全的 Agent,被当成生产系统:OpenClaw 之父谈“误用危机” 模型不再是关键?LangChain 创始人:真正决定Agent 上限的是运行框架 360 周鸿祎回应“龙虾”玩梗图:“龙虾”不是病毒,不会因为存在安全问题就否定一项技术 狂裁1600人,换掉CTO、晋升“下一代AI人才”!SaaS巨头的转型焦虑

!Image 18: 图片

你也「在看」吗?👇 跳转微信打开

A AI前线 @AI前线

One Sentence Summary

Cursor has released its in-house programming model, Composer 2.0, which outperforms Claude Opus 4.6 while costing only one-tenth as much, aiming to counter direct competition from model providers through in-house development and an Agent-centric transformation.

Summary

This article reports on the release of Composer 2.0, the second generation of Cursor's in-house model. The model performs exceptionally well in benchmarks like Terminal-Bench 2.0, surpassing Claude Opus 4.6, and demonstrates higher success rates, faster speeds, and significantly lower costs (only 10% of competitors) in real-world application generation tests. The article analyzes the strategic context behind Cursor's move: facing threats from CLI agents like Claude Code, Cursor is attempting to solidify its moat in AI programming by developing in-house models (fine-tuned from open-source models), shifting fully to an Agent-based collaboration model, and deepening its focus on the enterprise market, all to prevent being relegated from a "super entry point" to a mere "middle layer."

Main Points

* 1. Cursor releases Composer 2.0, with performance surpassing top-tier models in specific programming benchmarks.

Surpassing Claude Opus 4.6 in tests like Terminal-Bench 2.0 marks Cursor's transition from a mere IDE shell to a platform with core model capabilities.

* 2. Extreme cost-effectiveness, significantly lowering the barrier to entry for AI-assisted programming.

Priced at only one-tenth of Opus 4.6, it achieves a leap in commercial competitiveness by optimizing input/output costs while maintaining performance.

* 3. Cursor faces an 'IDE obsolescence' survival crisis, forcing a strategic transformation.

As model providers directly launch CLI tools and Agents, the status of code editors as the primary entry point is being challenged, forcing Cursor to save itself through in-house model development and Agent-based transformation.

* 4. Leveraging the open-source model ecosystem to build differentiated competitiveness.

Cursor cleverly utilizes open-source models like DeepSeek and Qwen for secondary training and reinforcement learning, achieving a low-cost, high-performance specialized programming model.

Key Quotes

* Code editors are no longer the only entry point; they are even starting to seem redundant. This is fatal for Cursor. * Cursor is not just doing a version update; it is racing against time to rewrite its reason for existing. * Composer 2.0 manages to crush both old rivals, Opus 4.6 and GPT-5.4, in terms of both speed and cost. * Software development is shifting from 'assisting with coding' to 'Agents completing tasks'.

AI Score

78

Website mp.weixin.qq.com

Published At Today

Length 2471 words (about 10 min)

Tags

Cursor

Composer 2.0

AI Programming

LLM

LLM Benchmarking

Related Articles

* Vol.89 AI Industry 2025 Annual Summary Supplement (V4 "Can't Wait" Edition) --- 70-page PPT Solo * The Evolution of Agent/Skills/Teams Architectures and Principles of Technology Selection * OpenAI Frontline Development Observations: Those Who Can Manage 10–20 Agents and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind * Deconstructing Agentic Coding from First Principles: From Theory to Practice * Software Engineering Outlook for the Next Two Years: From Writing Code to Managing AI, Programmers are Splitting into Two Careers * Dynamic Context Discovery * MiniMax M2.5 Released: $1/Hour, the King of Real-World Work through a native Agent RL framework.") * 1,500 PRs, 0 Humans Coding: Codex-Driven Million-Line Internal Product Practice * Working 100 Hours a Week! Google DeepMind CEO Reveals: Chinese Rival is ByteDance, Asserts Google is the Only Full-Stack AI Giant * Google Co-founder Sergey Brin's Unusual Reflection: Underestimating Transformer and the Risks of AI Code Generation, 'The Higher Cost of Erroneous AI Code' HomeArticlesPodcastsVideosTweets

Cursor's New In-House Model Outperforms Opus 4.6 at 1/10t...

查看原文 → 發佈: 2026-03-20 15:48:00 收錄: 2026-03-20 20:00:39

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。