How Spotify Automated Background Engineering with AI

Most engineering teams think of changes to the codebase as a whole as a project. Spotify automated these changes in the background using AI.

Spotify’s solution is called Honk, an AI agent for coding. It’s an agent that makes changes to the codebase in the background without requiring an engineer to be brought into the work. It’s not a chatbot integrated into a codebase. It’s more like a background daemon for your entire codebase. Since implementing Honk, Spotify is now able to merge over 650 agent-created pull requests into production every month. For the migrations they have done so far, they have been able to save 60–90% of the time it would take to write the code by hand.

The problem Honk was designed to solve

At Spotify’s scale, horizontal changes such as dependency upgrades, API changes, and convention enforcement have traditionally meant one thing: deterministic scripts. Write a script, execute it across all machines, validate output, and handle the edge cases it does not support. Sounds simple enough.

Well, it is, until it is not. Scripts are brittle. They fail on unknown code patterns, cannot reason about context, and take just about as long to write correctly as it does to do the migration in the first place. At a certain point, it just isn’t worth it.

Honk flips the model. Instead of encoding every transformation in a script, engineers describe the change in plain language and the agent figures out how to apply it.

How it actually works

Honk is not open source, but rather an internal solution built on top of Claude Code and Claude Agent SDK. This is significant, as it is housed within Spotify’s internal developer platform, called Backstage, which is open source. This is important, as all metadata is stored here, including what teams own what, what the status of services is, what level of security is required, and what level of reliability is needed. Honk reads all of this before ever touching any code.

If a developer wants to make a change, it is as simple as opening up Slack on one’s phone, telling Claude to fix a bug or create a feature, and Claude writes the necessary code. Once the task is complete, a new version of the application is sent back to Slack for the developer to review. This means that bugs can be fixed before one ever gets to the office.

Initially, Spotify experimented with open-source agents such as Aider, and then developed their own agential loop using LLM API, and ultimately moved to Claude Code since their own approach demanded too rigid instruction sets and failed to handle complex and multi-step editing.

The verification loop

The first and foremost question for a non-deterministic AI is: how do you know it is actually right?

Context: Honk solves this with a tight feedback loop. The agent goes through the codebase, makes changes, and then runs formatters, linters, builds, and tests. Then it opens a pull request. If anything fails, it does not stop. It uses the failure and re-enters the loop. This continues until it succeeds or determines it cannot make it work and raises a flag for a human to look at it.

This is the magic of Honk. The AI makes the change. The toolchain checks it. You only see it when it has already passed CI.

What this means for your team

Honk is not an open-source tool, and it is closely integrated with Spotify’s internal stack. However, the pattern it embodies is interesting nonetheless. You can’t automate what you don’t understand Everydev , Spotify’s own team Every team involved with their own Backstage setup, or at least their own decent CI pipelines, can benefit from various pieces of this pattern today. Even the verification loop, considering the AI output untrusted until it passes your existing test suite, can be bolted onto any existing workflow today.

Teams running their own Backstage installation, or even just good CI pipelines, can start applying parts of this today. The verification loop, treating AI outputs as untrusted until they pass your existing test suite, can be bolted onto any workflow today.

Conclusion: The shift worth making

Honk is interesting, not because Spotify created an AI that can write code, there are plenty of solutions that can do that. Honk is interesting because they created an AI that thinks like an engineer: it knows scope, respects ownership, and verifies its own outputs before proceeding, and escalates when it’s uncertain.

This is the hard part. Making an AI write a code change is easy. Making an AI think for itself across a dynamic, messy codebase with real services, real owners, and real consequences for getting it wrong requires infrastructure that most companies haven’t yet developed.

The companies that will succeed at this are not the ones with the best AI models. They’re the ones with good service catalogs, well-defined ownership, and CI systems so tightly controlled that they can catch the AI’s mistakes. The AI is only as good as the system around it.

Comments

Loading comments…