AI 能写百万行代码以后，软件工程的瓶颈变成了什么？

Skip to main content ![Image 5: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

AI 能写百万行代码以后，软件工程的瓶颈变成了什么？ ==========================

人人人都是产品经理 @人人都是产品经理

One Sentence Summary

By analyzing the Cursor team's GPT-5.2 multi-agent experiment, this article reveals that software engineering bottlenecks are shifting from code generation to task decomposition, agent coordination, and system verification.

Summary

The article provides an in-depth analysis of an engineering experiment conducted by the Cursor team: using a GPT-5.2-driven multi-agent system to generate a million-line browser codebase within weeks. The experiment found that simply improving model capabilities is not the key; the real challenge lies in the coordination costs of multi-agent collaboration. The article details the evolution of the collaboration architecture from 'flat autonomy' to 'optimistic concurrency control,' and finally to the established 'Planner-Executor-Reviewer' pipeline structure. The author points out that with the explosion of AI coding capabilities, software engineering bottlenecks have shifted. The core value of future engineers will focus on high-level system design, precise task decomposition, definition of responsibility, and the verifiability and maintainability of large-scale code.

Main Points

* 1. The bottleneck of multi-agent collaboration lies in coordination costs rather than generation speed.In flat structures, agents are prone to issues such as holding locks indefinitely, risk aversion, and blurred responsibilities, causing system throughput to degrade as concurrency increases. * 2. The 'Planner-Executor-Reviewer' model is an effective architecture for solving long-range task drift.By decomposing tasks into recursive planning, pure execution, and periodic reviews, responsibilities can be clearly assigned, and periodic restart mechanisms can be used to counter the narrowing of an agent's perspective during long-running tasks. * 3. The core value of software engineering is shifting from 'writing code' to 'system design and verification'.As the cost of code production approaches zero, ensuring the stability, reproducibility, and logical correctness of millions of lines of code becomes a scarcer skill than writing code itself. * 4. Prompt engineering still plays a decisive role in building complex systems.In long-running tasks, system behavior is highly dependent on the quality of prompts, whose importance may even exceed that of the framework and model itself in specific engineering scenarios.

Metadata

AI Score

Website mp.weixin.qq.com

Published At Yesterday

Length 2272 words (about 10 min)

张艾拉 2026-03-11 07:46 广东

以下文章来源于：Fun AI Everyday Fun AI Everyday 每天分享一个好玩的AI应用

!Image 6

软件工程的瓶颈正在“迁移”

Cursor团队的GPT-5.2多智能体实验暴露了比代码生成更致命的瓶颈：任务拆解、责任归属与协同机制正成为新战场。这场持续数周的工程马拉松不仅揭示了Agent协作的7个反直觉陷阱，更预示着未来工程师的核心价值将从编码转向系统设计。 ———— / BEGIN / ————

前几天，Cursor 的CEO Michael Truell 在社交媒体上提到：他们让一套由 GPT-5.2 驱动的系统连续跑了一周，产出数百万行代码、数千个文件的浏览器相关代码库。

!Image 7

这听起来像产品发布，但更准确地说，它是一场工程实验：Cursor 团队想验证的不是 AI 会不会写代码，而是当 AI 可以同时并发、持续运行数周以后，大型软件项目的瓶颈到底在哪里。

Cursor 在自己的播客里也发布了相关内容，详述了他们如何在一个项目上同时运行数百个并发 Agent，观察它们如何写出超过百万行代码、如何在长时间运行里保持推进。

!Image 8 具体怎么做到的？ ------------

关键不在模型，而在组织。

如果你把这件事理解成模型变强了，所以能写更多代码，那会错过最关键的部分。

Cursor 博客里讲得非常工程化：他们真正遇到的麻烦不是写不出来，而是多 Agent 协作时的协调成本，这也恰恰是现实软件团队最熟悉、也最难优化的那部分。

一开始他们走的是“扁平自治”的直觉方案：所有 agent 地位平等，共享一个文件来认领任务、更新状态。为了防止抢同一个任务，他们加了锁。

结果很快翻车：agent 会持锁太久甚至忘了释放；系统吞吐量会从 20个agent退化成 2-3个agent 的有效速度；更糟的是系统脆弱，agent 失败时可能带着锁一起挂，甚至出现不拿锁就写入协调文件的混乱情况。

之后，他们改用“乐观并发控制”，让读取自由、写入冲突就失败。

这确实更健壮，但更深层的问题仍然存在：没有层级结构时，agent 会变得非常规避风险，它们会回避困难任务，去做“小而安全”的修改；没人承担端到端责任，于是看起来很忙，实际在空转。

真正让系统开始像团队一样工作的，是他们把扁平结构拆成了一条职责清晰的流水线：

* 规划者：持续探索代码库、拆任务，还可以派生子规划者，让规划本身也能并行、递归展开；

* 执行者：只负责把领到的任务做完、提交变更，不需要关心全局，也不与其他执行者协调；

* 评审：每个周期结束判断是否继续，然后下一轮从干净的初始状态重新开始，用这种方式对抗长期运行的漂移和视野变窄。

这一段是原文的核心方法论：它基本解决了协同问题，能把系统扩展到非常大的项目，同时避免单个 agent 越跑越钻牛角尖。

更有意思的是他们的经验总结：很多改进来自减法而不是加法。

例如他们曾设计过集成者专门做质量控制和冲突解决，后来发现它制造的瓶颈多于解决的问题：执行者本身就能处理不少冲突。

以及一个非常务实、但经常被忽视的结论：在长时间任务里，系统行为很大程度取决于提示词怎么写，框架和模型重要，但提示词更重要。

同时，多智能体协同仍然很难，系统还需要定期从头重启来对抗漂移。 这说明了什么？软件工程的瓶颈正在“迁移” ------------------------

如果把这次实验拆开看，它其实在把一个旧问题换个问法：未来软件工程的瓶颈，可能从写代码的“人力”转移到“如何组织大量自动化执行体”。

过去的瓶颈是：工程师数量、团队协作成本、代码评审节奏。

这次 Cursor 的实验，把新瓶颈至少推到了四个位置： 第一，任务拆分与责任归属，比写代码更稀缺。

扁平结构下 agent 倾向做安全小改动，本质上就是没人对最终结果负责。

你会发现，这和现实团队里没人愿意背锅的大需求一模一样。Cursor 最终用规划者/执行者/评审的结构，把责任重新压实。 第二，协调机制与吞吐量，决定了并发到底是乘法还是内耗。

锁把系统拖慢、让并发退化；乐观并发更健壮但仍然会空转。

换句话说，当你有上百个 agent 时，工程效率不再取决于一个人写得多快，而取决于组织系统有没有把冲突和等待压到最低。 第三，长期运行的漂移是常态，复位机制是必需品。

在长任务里，agent 会偏航、会视野变窄，所以他们明确写到仍需要定期从头重启，并用评审把迭代切成周期来对抗漂移。 第四，验收与可验证性，会成为比产出代码更关键的成本中心。

百万行代码的价值，不在于写出来，而在于能不能稳定跑、能不能被复现、能不能被维护。

这也是外界讨论最集中的点：这到底算不算做出了浏览器？ 你可能关心：这个浏览器真的能跑起来吗？ -----------------------

先说结论：它能跑，但更多是“能动起来”，离“能用起来”还差一大截。

原因很简单，大家讨论的“能不能跑”其实不是一件事。

第一层是“有没有实物”，有。Cursor 把代码放出来了，说明这不是口嗨。

第二层是“能不能当产品用”，暂时不行。这类原型离稳定性、兼容性、性能、安全性都很远，还谈不上日常可用。

第三层才是你真正关心的：能不能打开网页。更接近“能渲染一些简单页面”，但覆盖范围有限、问题也不少，所以它更像一次工程实验，而不是一个可替代 Chrome 的浏览器发布。

其实 Cursor 这件事最值得看的，不是浏览器做没做出来，而是它把一个趋势摆到了台面上：

当 AI 可以很便宜地写出海量代码后，软件工程的关键不再是“写”，而是怎么组织、怎么验收、怎么让系统持续朝着正确方向推进。

最后我想说的是，今天的 AI 还做不到把复杂系统做成产品，但它已经能把复杂系统推到一个可运行的原型。

接下来真正决定分水岭的，不是代码量，而是谁能把长期协作、质量控制、可复现交付这套工程体系也一起自动化，那才是 AI 把软件生产方式改写的开始。 ———— / E N D / ————

本文来自公众号：Fun AI Everyday 作者：张艾拉 👇 想要第一时间了解行业动态、面试技巧、商业知识等等等？加入产品经理进化营，跟优秀的产品人一起交流成长！ !Image 9 ———— / 推荐阅读 / ———— ![Image 10](https://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA==&mid=2651922810&idx=1&sn=4e5dd069ce3065264c6aa7699d2a1c72&scene=21#wechat_redirect) ![Image 11](https://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA==&mid=2651922671&idx=1&sn=319263f8809b24d7f305cec2dbb015cf&scene=21#wechat_redirect) ![Image 12](https://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA==&mid=2651916002&idx=1&sn=f5a988a0e004bce6f2bd478d1b7173a9&scene=21#wechat_redirect) 阅读原文跳转微信打开

人人人都是产品经理 @人人都是产品经理

One Sentence Summary

Summary

Main Points

* 1. The bottleneck of multi-agent collaboration lies in coordination costs rather than generation speed.

In flat structures, agents are prone to issues such as holding locks indefinitely, risk aversion, and blurred responsibilities, causing system throughput to degrade as concurrency increases.

* 2. The 'Planner-Executor-Reviewer' model is an effective architecture for solving long-range task drift.

By decomposing tasks into recursive planning, pure execution, and periodic reviews, responsibilities can be clearly assigned, and periodic restart mechanisms can be used to counter the narrowing of an agent's perspective during long-running tasks.

* 3. The core value of software engineering is shifting from 'writing code' to 'system design and verification'.

As the cost of code production approaches zero, ensuring the stability, reproducibility, and logical correctness of millions of lines of code becomes a scarcer skill than writing code itself.

* 4. Prompt engineering still plays a decisive role in building complex systems.

In long-running tasks, system behavior is highly dependent on the quality of prompts, whose importance may even exceed that of the framework and model itself in specific engineering scenarios.

Key Quotes

* The key is not the model, but the organization. * When you have hundreds of agents, engineering efficiency no longer depends on how fast one person writes, but on whether the organizational system minimizes conflicts and waiting. * The value of a million lines of code lies not in its creation, but in whether it can run stably, be reproduced, and be maintained. * The core value of future engineers will shift from coding to system design. * The real watershed is not the volume of code, but who can also automate the entire engineering system of long-term collaboration, quality control, and reproducible delivery.

AI Score

Website mp.weixin.qq.com

Published At Yesterday

Length 2272 words (about 10 min)

AI 能写百万行代码以后，软件工程的瓶颈变成了什么？

One Sentence Summary

Summary

Main Points

Metadata

One Sentence Summary

Summary

Main Points

Key Quotes

Tags

Related Articles

🤖 問 AI

Related Articles

Que Hangning: Re-evaluating the Value and Practical Exploration of Product Managers in the AI Era

Cognitive Reconstruction: After Three Months with Speckit, I Abandoned It - Escaping the Dilemma of Powerful Tools That Are Hard to Use Well

2025 AI Product Conference On-site Report: 90% Cost Reduction, Doubled Efficiency? Industry Experts Reveal How AI Is Reshaping Workflows

Product Roadmapping in Complex Enterprise Systems: What's the Best Approach?

What is Multi-turn Dialogue in AI Agents? How to Optimize Its Performance?

Kimi Releases and Open-Sources K2.5 Model, Bringing New Visual Understanding, Code, and Agent Cluster Capabilities

Top-level Thinking and Methodology for AI Coding Products: From Paradigm Revolution to Spec Coding

Practices and Reflections on Vibe Coding in Code Generation and Collaboration

AI Voice Interaction: What Giants and Startups Are Doing