← 回總覽

QCon London 2026:持续重写 Spotify 的全部代码库

📅 2026-03-18 23:02 Daniel Curtis 人工智能 5 分鐘 5049 字 評分: 88
AI 智能体 LLMOps 软件维护 代码迁移 工程生产力
📌 一句话摘要 Spotify 利用名为“Honk”的 AI 智能体实现大规模代码库迁移自动化,每 10 天合并 1,000 个 PR,将工程瓶颈从编码转移到了代码审查环节。 📝 详细摘要 在 QCon London 2026 大会上,Spotify 展示了“Honk”,这是一个内部 AI 编程智能体,旨在处理传统确定性脚本无法解决的复杂、大规模代码迁移任务。虽然 Spotify 此前的“车队管理”(Fleet Management)系统实现了 70% 的迁移自动化,但 Honk 通过将整个开发生命周期(需求、生成和测试)封装到 LLM 驱动的工作流中,解决了剩余 30% 的棘手边缘案例。

At QCon London 2026, Jo Kelly-Fenton and Aleksandar Mitic from Spotify presented how the company is using an internal AI-powered coding agent called Honk to perform continuous, large-scale code migrations across its entire codebase, achieving 1,000 merged pull requests every 10 days.

The presentation, titled "Rewriting All of Spotify's Code Base, All the Time," explored the evolution from Spotify's existing Fleet Management system to an LLM-driven approach that addresses the long tail of complex migrations that deterministic scripts could not resolve. The speakers noted that developers spend little time actually writing code, citing research suggesting engineers average just 52 minutes of coding per day, with the remainder consumed by meetings and maintenance tasks.

!Image 1/filters:no_upscale()/news/2026/03/spotify-honk-rewrite/en/resources/1spotify1-1773836086045.jpeg)

Spotify's Fleet Management philosophy places responsibility on library owners to migrate all consumers to the latest version. Before Honk, automated scripts could transform code and create pull requests across thousands of repositories, reducing migration timelines from nearly a year to under a week for 70% of the fleet. However, the remaining 30% proved extremely difficult due to edge cases and complexity, leaving incomplete migrations that increased codebase diversity.

!Image 2/filters:no_upscale()/news/2026/03/spotify-honk-rewrite/en/resources/1IMG_2240-1773836086045.jpeg)

Honk was born from the idea of replacing these deterministic scripts with LLMs that could better handle edge cases. The team quickly realised they needed to package the entire software development process, including requirements, code generation, building, testing, and iteration.

Early challenges revealed that agents would take shortcuts to make builds pass, such as commenting out failing tests or downgrading Java versions. The team initially implemented an "LLM as judge" to evaluate whether generated code addressed the original requirements, but found it too rigid, blocking valid changes. As models improved, the judge was eventually removed, with verification steps in prompts proving sufficient.

Scaling to hundreds of repositories introduced infrastructure challenges, including missing credentials, Docker requirements, and the inability to run iOS builds on Linux machines. A critical architectural decision was to separate the agent runtime from the verification runtime. Honk now pushes branches to GitHub, triggers builds via a verification service that abstracts CI systems, waits for results, and only creates pull requests after full validation.

A hack week integration with Slack proved transformative. Developers wanted to act on work directly from where it was discussed, initiating code changes from Slack threads containing dashboards, logs, and Jira links. This evolved into a "code from anywhere" approach, with an exposed API enabling integrations from any surface.

The scale of output has grown significantly. Six months ago, Honk achieved 1,000 merged pull requests in three months. Today, that same volume is reached in just 10 days. The speakers noted this shift has made PR review, not code generation, the new bottleneck, drawing a parallel to aviation where pilots monitoring automated systems perform the hardest job.

To address the review challenge, the team outlined three strategies. First, a culture shift around review expectations, including allowing migration drivers to approve their own PRs and closing stale pull requests. Second, tooling improvements such as a PR inbox that helps prioritise reviews and potential auto-merging for documentation changes. Third, and most significantly, codebase standardisation.

The speakers argued that a diverse codebase produces a diverse set of problems, leading to complex prompts filled with conditional logic. Their standardisation strategy involves advisory boards making technology decisions, using Honk to drive existing migrations to 100% completion, and enforcing standards through monorepos with linting. This creates what the speakers described as a cycle: standardisation leads to more correct agent code, which enables easier review, which increases code capacity, which drives further standardisation.

!Image 3/filters:no_upscale()/news/2026/03/spotify-honk-rewrite/en/resources/1IMG_2293-1773836086045.jpeg)

Spotify has published a three-part engineering blog series detailing Honk's development, covering the agent's journey.

查看原文 → 發佈: 2026-03-18 23:02:00 收錄: 2026-03-19 00:00:48

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。