Skip to main content ![Image 2: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% Faster Feedback and 80% Engineer Approval

I InfoQ @Leela Kumili

One Sentence Summary

HubSpot's 'Sidekick' AI agent accelerates code reviews by 90% using a multi-model architecture and a 'Judge Agent' pattern to ensure high-quality, actionable feedback.

Summary

HubSpot has developed Sidekick, an internal AI-powered code review agent designed to eliminate bottlenecks in the pull request process. While AI coding assistants have accelerated code creation, manual reviews often lag; Sidekick addresses this by providing near-instant feedback. The system evolved from a complex Kubernetes-based setup to a streamlined Java framework called Aviator, which supports multiple LLMs like Claude, GPT-4, and Gemini. A critical innovation is the 'Judge Agent' pattern, which filters out verbose or low-value comments before they reach developers. This approach has resulted in a 90% reduction in feedback latency and an 80% engineer approval rating, allowing human reviewers to focus on high-level architecture rather than syntax or trivial bugs.

Main Points

* 1. The 'Judge Agent' pattern is essential for maintaining a high signal-to-noise ratio in AI reviews.By implementing a secondary LLM to evaluate and filter the primary agent's comments, HubSpot successfully reduced 'noise'—verbose or overly positive feedback—ensuring that only actionable and valuable insights are presented to developers. * 2. Integrating AI agents into existing development frameworks reduces operational complexity and latency.Moving from isolated Kubernetes-based workloads to the 'Aviator' Java framework allowed the review agent to access repository context via RPC-based tools, significantly improving the relevance of comments while lowering infrastructure overhead. * 3. AI-assisted code review enables human engineers to focus on higher-level architectural design.With Sidekick handling immediate feedback on coding conventions and potential bugs, human reviewers are freed from routine checks, allowing them to dedicate more time to complex system design and long-term maintainability. * 4. A multi-model approach provides flexibility and resilience against model-specific limitations.By supporting providers like Anthropic, OpenAI, and Google, HubSpot can experiment with different models for specific tasks and maintain fallback options, ensuring the system remains robust and cost-effective.

Metadata

AI Score

Website infoq.com

Published At Today

Length 490 words (about 2 min)

HubSpot engineers introduced Sidekick, an internal AI powered code review agent designed to analyze pull request changes and provide automated feedback to developers. The system uses large language models to review code and post comments directly in repositories on GitHub. According to the engineering team, the tool reduced time-to-first-feedback on pull requests by approximately 90 percent while helping developers identify issues earlier in the review process.

Code review is essential in software development, but can be delayed when reviewers are unavailable. At HubSpot, engineers found that AI coding assistants sped up code creation, while manual reviews lagged. Sidekick provides immediate pull request feedback, letting human reviewers focus on architecture and higher-level design, improving efficiency and reducing review bottlenecks.

As Emily Adams explained in a company blog post,

> What we found might surprise you: our AI code reviewer catches real issues, understands HubSpot‑specific context, and maintains a high signal to noise ratio, often leaving no comments at all.

The first version of the system ran on an internal platform called Crucible. Large language model agents operated in Kubernetes environments and interacted with GitHub repositories via the command line. The agents retrieved pull request changes and generated review comments using prompts to identify potential issues or improvements. While this approach demonstrated that LLMs could provide useful feedback, it introduced operational complexity. Each review required separate containerized workloads, increasing latency and infrastructure overhead, and limited control over agent interactions with developer tooling and internal services.

To address these limitations, the engineering team migrated the system to a Java based agent framework called Aviator. It integrates with HubSpot’s development platform, letting review agents run within existing services rather than isolated workloads. Aviator supports multiple model providers, including Anthropic, OpenAI, and Google, enabling experimentation and fallback options. Through RPC-based tool abstractions, agents retrieve repository context such as configuration settings and coding conventions, improving the relevance and accuracy of automated review comments.

A key challenge identified during deployment was feedback quality. Early versions produced verbose or overly positive comments considered noise. To address this, the team introduced a " judge agent," which evaluates comments before posting them to pull request discussions. According to HubSpot engineers, this evaluator pattern reduced low-value comments and improved the signal-to-noise ratio. Developers can also react to automated comments, providing feedback that guides prompt adjustments and model selection. The system has recorded a consistent 80% thumbs-up rate from engineers, demonstrating strong adoption and trust.

!Image 3/filters:no_upscale()/news/2026/03/hubspot-ai-code-review-agent/en/resources/1judgeagent-1772995919417.jpeg)

Review Agent to Judge Agent evaluation loop (Source: HubSpot Blog Post) Brian L, VP of Engineering at HubSpot, noted on LinkedIn:

> The most impactful change was adding a second agent to evaluate reviews before posting. The result: fewer, better, and more actionable comments. We knew we’d gotten it right when engineers started asking to see Sidekick’s feedback even before opening a PR.

HubSpot engineers mention that future work includes adding persistent memory for review agents and expanding context retrieval across repositories to improve understanding of related code changes.

I InfoQ @Leela Kumili

One Sentence Summary

HubSpot's 'Sidekick' AI agent accelerates code reviews by 90% using a multi-model architecture and a 'Judge Agent' pattern to ensure high-quality, actionable feedback.

Summary

Main Points

* 1. The 'Judge Agent' pattern is essential for maintaining a high signal-to-noise ratio in AI reviews.

By implementing a secondary LLM to evaluate and filter the primary agent's comments, HubSpot successfully reduced 'noise'—verbose or overly positive feedback—ensuring that only actionable and valuable insights are presented to developers.

* 2. Integrating AI agents into existing development frameworks reduces operational complexity and latency.

Moving from isolated Kubernetes-based workloads to the 'Aviator' Java framework allowed the review agent to access repository context via RPC-based tools, significantly improving the relevance of comments while lowering infrastructure overhead.

* 3. AI-assisted code review enables human engineers to focus on higher-level architectural design.

With Sidekick handling immediate feedback on coding conventions and potential bugs, human reviewers are freed from routine checks, allowing them to dedicate more time to complex system design and long-term maintainability.

* 4. A multi-model approach provides flexibility and resilience against model-specific limitations.

By supporting providers like Anthropic, OpenAI, and Google, HubSpot can experiment with different models for specific tasks and maintain fallback options, ensuring the system remains robust and cost-effective.

Key Quotes

* Our AI code reviewer catches real issues, understands HubSpot‑specific context, and maintains a high signal to noise ratio, often leaving no comments at all. * The most impactful change was adding a second agent to evaluate reviews before posting. The result: fewer, better, and more actionable comments. * We knew we'd gotten it right when engineers started asking to see Sidekick's feedback even before opening a PR. * The tool reduced time-to-first-feedback on pull requests by approximately 90 percent while helping developers identify issues earlier.

AI Score

Website infoq.com

Published At Today

Length 490 words (about 2 min)

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% F...

HubSpot 的 Sidekick：多模型 AI 代码审查，反馈速度提升 90%，工程师认可度达 80%

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% Faster Feedback and 80% Engineer Approval

One Sentence Summary

Summary

Main Points

Metadata

One Sentence Summary

Summary

Main Points

Key Quotes

Tags

Related Articles

HubSpot’s Sidekick: Multi-Model AI Code Review with 90% F...

🤖 問 AI