AI made developers 19% slower. Here's what they were doing wrong

The most rigorous study we have on AI coding productivity comes from METR, a non-profit AI research lab that ran a randomised controlled trial with 16 experienced open-source developers across 246 real-world tasks. The result: developers using AI tools completed tasks 19% slower than without them.

That's a striking finding. But the truly alarming part isn't the slowdown. It's this: before the study, developers predicted AI would make them 24% faster. After experiencing the slowdown, they still believed AI had sped them up by 20%.

That perception gap, being slower while feeling faster, is the most dangerous thing happening in software engineering right now.

What METR actually measured

The study deserves close reading, because the details matter more than the headline.

These were experienced developers working on their own mature repositories, codebases they knew intimately. They were using mainstream AI coding tools: autocomplete, chat-based assistance, the kind of tooling that most organisations have already deployed.

Domenic Denicola, maintainer of jsdom and one of the study participants, published a detailed account. A performance optimisation he estimated at 30 minutes took 4 hours and 7 minutes with AI. A test-writing task he estimated at 1 hour took 4 hours and 20 minutes. Agents got stuck in loops, spent 30+ minutes finding files, applied syntax incorrectly, and moved lines one at a time instead of reordering them.

His verdict: AI made the tasks "more engaging," like a game, but not faster. And agents required "constant handholding and continuous awareness of the model's limitations."

This is not a story about AI being useless. It's a story about developers using the wrong tools in the wrong way on the wrong tasks.

The autocomplete trap

Here's what most organisations have done: they've bought Copilot licences, rolled them out to the engineering team, and called it AI adoption.

That's like giving someone a power drill and asking them to use it as a hammer. It's the wrong tool for the job. Or more precisely, it's the right tool being used in the wrong paradigm.

Autocomplete-style AI, suggesting the next few lines as you type, is useful for boilerplate. But when you're working in a codebase you already know well, on tasks you already know how to do, adding an AI middleman to your muscle memory is overhead. You're slower because you're reading and evaluating suggestions for code you could have written faster yourself.

The METR developers weren't bad engineers. They were excellent engineers using AI tools designed for a workflow that doesn't match how excellent engineers actually work.

What "doing it right" looks like

Meanwhile, in a parallel universe, Boris Cherny, the head of Claude Code at Anthropic, ships 10 to 30 pull requests per day. Every line written by AI. He hasn't edited code by hand since November 2025.

Lee Edwards, an investor at Root Ventures, wrote hundreds of thousands of lines of code across six projects in two weeks using agentic AI tools. He described it as "a nuclear-powered six-axis mill. A single-person software factory."

At StrongDM, three engineers built a system where no human writes code and no human reviews code. The humans design specifications, curate test scenarios, and watch scores. Simon Willison called it "the most ambitious form of AI-assisted software development I've seen yet."

These aren't demo numbers. These are production systems. I've seen the same pattern in my own work: the agentic AI system I designed for a European insurance brokerage resolves 67% of customer service cases without human intervention. Not a demo. A production system handling thousands of cases per month.

The difference isn't talent. It's method.

Autocomplete versus agentic

The gap in the data makes no sense until you separate two fundamentally different approaches to AI-assisted development.

Autocomplete AI (Copilot, inline suggestions): the developer writes code, the AI suggests completions. The developer remains in the driver's seat, making every decision, with the AI offering marginal assistance on each line. On familiar codebases with experienced developers, this adds friction more often than it removes it. DX's survey of 121,000 developers found that productivity gains from this approach have plateaued at roughly 10%, unchanged since Q2 2025.

Agentic AI (Claude Code, Cursor agents, similar tools): the developer describes what needs to be built. The AI agent reads the codebase, writes the code, runs the tests, iterates on failures, and delivers a working result. The developer's role shifts from writing to directing and reviewing. This is where the 5x-10x productivity claims come from, and where the evidence supports them.

The METR study measured the first approach. The production results from Cherny, Edwards, and StrongDM use the second.

This distinction matters enormously, because most organisations haven't made the shift. They're still in the autocomplete paradigm: buying licences, measuring adoption rates, and wondering why the numbers haven't moved.

The perception gap is the real problem

Let me come back to that METR finding: developers were 19% slower but believed they were 20% faster. That's a 39-percentage-point perception gap.

This pattern is everywhere. The DX survey found 92.6% of developers use AI tools at least monthly, but productivity gains are stuck at 10%. Fastly's survey found 95% of developers spend extra hours debugging AI output. Seniors spend up to 40% of their time fixing AI-generated code. CodeRabbit's analysis of 470 open-source repositories found that AI creates 1.7x more bugs, 75% more logic errors, and 2.74x more security vulnerabilities than human-written code.

Meanwhile, Stack Overflow's 2025 developer survey shows trust in AI accuracy has fallen to 29%, down from 40%. And the top frustration, cited by 66% of developers: "AI solutions that are almost right, but not quite."

If your team believes AI is making them faster but you can't see it in the output metrics, the METR perception gap is probably your explanation. They're not lying. They genuinely feel more productive. The tool is engaging and reduces the tedium of certain tasks. But the net effect, including debugging, rework, and the cognitive overhead of evaluating suggestions, is neutral or negative.

You can't fix what you can't see. And if your developers sincerely believe the tool is helping when the data says it isn't, you have a measurement problem masquerading as a productivity problem.

What I'd tell a CTO reading this

I use agentic AI tools daily. I've seen what they can do in the hands of someone who understands both the tools and the engineering problems they're applied to. The productivity gains are real. Genuinely transformative in some workflows.

But I've also seen the other side: organisations that rolled out AI coding tools, measured adoption instead of outcomes, and now have developers who are slower, less rigorous in code review, and accumulating technical debt at a rate that will cost them dearly in 18 months.

The difference comes down to three things:

First, use the right class of tool for the job. Autocomplete has its place for quick lookups, unfamiliar syntax, and API discovery. But for meaningful productivity gains, you need agentic tools that can take on whole tasks, not just suggest the next line. And your workflow needs to be rebuilt around delegation and review, not around typing faster.

Second, measure outcomes, not adoption. I don't care what percentage of your team uses AI tools. I care about cycle time, defect rate, time-to-production, and business metrics. If those haven't improved, your AI adoption isn't working, regardless of what your developers report in satisfaction surveys. The discipline this requires is the same one I learned building defence-grade platforms: measure what matters, not what's easy to count.

Third, invest in AI fluency, not just AI access. The gap between the METR developers (19% slower) and Boris Cherny (27 PRs per day) is not about the AI models. It's about how you structure your work around them. That's a skill, and like any skill, it requires deliberate development, not just tool provisioning.

The organisations that get this right will have a genuine competitive advantage. The ones that confuse AI adoption with AI effectiveness will have expensive tools and the same output. Or worse.

If you're trying to figure out whether your AI tooling investment is actually delivering, let's talk.