← 回總覽

HubSpot 的 Sidekick:多模型 AI 代码审查,反馈速度提升 90%,工程师认可度达 80%

📅 2026-03-18 22:38 Leela Kumili 软件编程 12 分鐘 14333 字 評分: 84
AI 代码审查 LLM 智能体 软件工程 开发者生产力 多智能体系统
📌 一句话摘要 HubSpot 的 AI 智能体“Sidekick”利用多模型架构和“评判智能体”(Judge Agent)模式,将代码审查速度提升了 90%,确保提供高质量、可操作的反馈。 📝 详细摘要 HubSpot 开发了 Sidekick,这是一个内部 AI 代码审查智能体,旨在消除 Pull Request(PR)流程中的瓶颈。虽然 AI 编码助手加快了代码编写速度,但人工审查往往滞后;Sidekick 通过提供近乎即时的反馈解决了这一问题。该系统从复杂的基于 Kubernetes 的设置演变为名为 Aviator 的精简 Java 框架,支持 Claude、GPT-4 和 Ge
Skip to main content ![Image 2: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% Faster Feedback and 80% Engineer Approval

I InfoQ @Leela Kumili

One Sentence Summary

HubSpot's 'Sidekick' AI agent accelerates code reviews by 90% using a multi-model architecture and a 'Judge Agent' pattern to ensure high-quality, actionable feedback.

Summary

HubSpot has developed Sidekick, an internal AI-powered code review agent designed to eliminate bottlenecks in the pull request process. While AI coding assistants have accelerated code creation, manual reviews often lag; Sidekick addresses this by providing near-instant feedback. The system evolved from a complex Kubernetes-based setup to a streamlined Java framework called Aviator, which supports multiple LLMs like Claude, GPT-4, and Gemini. A critical innovation is the 'Judge Agent' pattern, which filters out verbose or low-value comments before they reach developers. This approach has resulted in a 90% reduction in feedback latency and an 80% engineer approval rating, allowing human reviewers to focus on high-level architecture rather than syntax or trivial bugs.

Main Points

* 1. The 'Judge Agent' pattern is essential for maintaining a high signal-to-noise ratio in AI reviews.By implementing a secondary LLM to evaluate and filter the primary agent's comments, HubSpot successfully reduced 'noise'—verbose or overly positive feedback—ensuring that only actionable and valuable insights are presented to developers. * 2. Integrating AI agents into existing development frameworks reduces operational complexity and latency.Moving from isolated Kubernetes-based workloads to the 'Aviator' Java framework allowed the review agent to access repository context via RPC-based tools, significantly improving the relevance of comments while lowering infrastructure overhead. * 3. AI-assisted code review enables human engineers to focus on higher-level architectural design.With Sidekick handling immediate feedback on coding conventions and potential bugs, human reviewers are freed from routine checks, allowing them to dedicate more time to complex system design and long-term maintainability. * 4. A multi-model approach provides flexibility and resilience against model-specific limitations.By supporting providers like Anthropic, OpenAI, and Google, HubSpot can experiment with different models for specific tasks and maintain fallback options, ensuring the system remains robust and cost-effective.

Metadata

AI Score

84

Website infoq.com

Published At Today

Length 490 words (about 2 min)

Sign in to use highlight and note-taking features for a better reading experience. Sign in now

HubSpot engineers introduced Sidekick, an internal AI powered code review agent designed to analyze pull request changes and provide automated feedback to developers. The system uses large language models to review code and post comments directly in repositories on GitHub. According to the engineering team, the tool reduced time-to-first-feedback on pull requests by approximately 90 percent while helping developers identify issues earlier in the review process.

Code review is essential in software development, but can be delayed when reviewers are unavailable. At HubSpot, engineers found that AI coding assistants sped up code creation, while manual reviews lagged. Sidekick provides immediate pull request feedback, letting human reviewers focus on architecture and higher-level design, improving efficiency and reducing review bottlenecks.

As Emily Adams explained in a company blog post,

> What we found might surprise you: our AI code reviewer catches real issues, understands HubSpot‑specific context, and maintains a high signal to noise ratio, often leaving no comments at all.

The first version of the system ran on an internal platform called Crucible. Large language model agents operated in Kubernetes environments and interacted with GitHub repositories via the command line. The agents retrieved pull request changes and generated review comments using prompts to identify potential issues or improvements. While this approach demonstrated that LLMs could provide useful feedback, it introduced operational complexity. Each review required separate containerized workloads, increasing latency and infrastructure overhead, and limited control over agent interactions with developer tooling and internal services.

To address these limitations, the engineering team migrated the system to a Java based agent framework called Aviator. It integrates with HubSpot’s development platform, letting review agents run within existing services rather than isolated workloads. Aviator supports multiple model providers, including Anthropic, OpenAI, and Google, enabling experimentation and fallback options. Through RPC-based tool abstractions, agents retrieve repository context such as configuration settings and coding conventions, improving the relevance and accuracy of automated review comments.

A key challenge identified during deployment was feedback quality. Early versions produced verbose or overly positive comments considered noise. To address this, the team introduced a " judge agent," which evaluates comments before posting them to pull request discussions. According to HubSpot engineers, this evaluator pattern reduced low-value comments and improved the signal-to-noise ratio. Developers can also react to automated comments, providing feedback that guides prompt adjustments and model selection. The system has recorded a consistent 80% thumbs-up rate from engineers, demonstrating strong adoption and trust.

!Image 3/filters:no_upscale()/news/2026/03/hubspot-ai-code-review-agent/en/resources/1judgeagent-1772995919417.jpeg)

Review Agent to Judge Agent evaluation loop (Source: HubSpot Blog Post) Brian L, VP of Engineering at HubSpot, noted on LinkedIn:

> The most impactful change was adding a second agent to evaluate reviews before posting. The result: fewer, better, and more actionable comments. We knew we’d gotten it right when engineers started asking to see Sidekick’s feedback even before opening a PR.

HubSpot engineers mention that future work includes adding persistent memory for review agents and expanding context retrieval across repositories to improve understanding of related code changes.

I InfoQ @Leela Kumili

One Sentence Summary

HubSpot's 'Sidekick' AI agent accelerates code reviews by 90% using a multi-model architecture and a 'Judge Agent' pattern to ensure high-quality, actionable feedback.

Summary

HubSpot has developed Sidekick, an internal AI-powered code review agent designed to eliminate bottlenecks in the pull request process. While AI coding assistants have accelerated code creation, manual reviews often lag; Sidekick addresses this by providing near-instant feedback. The system evolved from a complex Kubernetes-based setup to a streamlined Java framework called Aviator, which supports multiple LLMs like Claude, GPT-4, and Gemini. A critical innovation is the 'Judge Agent' pattern, which filters out verbose or low-value comments before they reach developers. This approach has resulted in a 90% reduction in feedback latency and an 80% engineer approval rating, allowing human reviewers to focus on high-level architecture rather than syntax or trivial bugs.

Main Points

* 1. The 'Judge Agent' pattern is essential for maintaining a high signal-to-noise ratio in AI reviews.

By implementing a secondary LLM to evaluate and filter the primary agent's comments, HubSpot successfully reduced 'noise'—verbose or overly positive feedback—ensuring that only actionable and valuable insights are presented to developers.

* 2. Integrating AI agents into existing development frameworks reduces operational complexity and latency.

Moving from isolated Kubernetes-based workloads to the 'Aviator' Java framework allowed the review agent to access repository context via RPC-based tools, significantly improving the relevance of comments while lowering infrastructure overhead.

* 3. AI-assisted code review enables human engineers to focus on higher-level architectural design.

With Sidekick handling immediate feedback on coding conventions and potential bugs, human reviewers are freed from routine checks, allowing them to dedicate more time to complex system design and long-term maintainability.

* 4. A multi-model approach provides flexibility and resilience against model-specific limitations.

By supporting providers like Anthropic, OpenAI, and Google, HubSpot can experiment with different models for specific tasks and maintain fallback options, ensuring the system remains robust and cost-effective.

Key Quotes

* Our AI code reviewer catches real issues, understands HubSpot‑specific context, and maintains a high signal to noise ratio, often leaving no comments at all. * The most impactful change was adding a second agent to evaluate reviews before posting. The result: fewer, better, and more actionable comments. * We knew we'd gotten it right when engineers started asking to see Sidekick's feedback even before opening a PR. * The tool reduced time-to-first-feedback on pull requests by approximately 90 percent while helping developers identify issues earlier.

AI Score

84

Website infoq.com

Published At Today

Length 490 words (about 2 min)

Tags

AI Code Review

LLM Agents

Software Engineering

Developer Productivity

Multi-Agent Systems

Related Articles

* Head of Claude Code: What happens after coding is solved | Boris Cherny * Where Architects Sit in the Era of AI to describe human-AI collaboration levels, and highlighting the extende...") * Anthropic Introduces Claude Opus 4.6 with 1M Token Context * OpenAI Introduces Harness Engineering: Codex Agents Power Large‑Scale Software Development * Wilson Lin on FastRender: a browser built by thousands of parallel agents * Engineering Speed at Scale — Architectural Lessons from Sub-100-ms APIs * Building Claude Code with Boris Cherny * The Ideal Micro-Frontends Platform * How Claude Code Works - Jared Zoneraich, PromptLayer * 4 Patterns of AI Native Development HomeArticlesPodcastsVideosTweets

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% F...

查看原文 → 發佈: 2026-03-18 22:38:00 收錄: 2026-03-19 00:00:48

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。