Did the 2025 METR study prove AI makes developers slower?

It found that experienced developers working in large codebases they knew well took about 19 percent longer with AI tools, even though they felt roughly 20 percent faster. It is a specific setting, not proof that AI slows everyone down. Beginners, unfamiliar code, and boilerplate work often show real gains. The durable lesson is that how fast a tool feels is a poor measure of whether it is actually helping.

Why does AI feel faster even when it is slower?

AI shifts the work from producing to reviewing. Waiting for a draft feels lighter than starting from a blank page, so the effort of correcting, re-prompting, and fact-checking gets discounted. The task feels easier while quietly taking longer. That is why perceived speed is unreliable and you should measure completion time and error rate instead.

How do I measure whether AI is actually helping my business?

Pick one repetitive task, time it for a week the old way, then time it for a week with AI, including the correcting and re-prompting rather than just the generating. Compare completion time and error rate against that baseline. If you cannot state in minutes or errors what a tool saved you, you do not yet know if it saved you anything.

Where does AI reliably save time, and where does it cost time?

AI saves time on high-volume, low-stakes, forgiving work: summarizing threads, turning notes into a first draft, tagging and routing requests. It tends to cost time on irreversible, customer-facing, or context-heavy work where the review burden outweighs the head start. Automate the boring, reversible 80 percent and keep a human on judgment and anything hard to undo.

Why are companies like Tesla and Meta capping AI spending?

As AI moves from free trials to real bills, large companies are scrutinizing whether the spend produces results. Reported moves to cap internal AI budgets signal the same shift smaller businesses should make: stop asking whether AI is impressive and start asking whether it is measurably faster or cheaper than the current process.

← ALL ARTICLES2 July 2026

A Study Says AI Made Developers 19% Slower. Here's the Test for Your Business

A 2025 study by METR, a research nonprofit, found that experienced software developers using AI coding tools took about 19 percent longer to finish real tasks. The catch: those same developers felt roughly 20 percent faster. The lesson is not that AI is useless. It is that how fast a tool feels is a terrible way to judge whether it is working. For any business rolling out AI right now, that gap between feeling and result is the single most important thing to measure.

The timing matters. In the same window, Tesla reportedly moved to cap staff AI spending at 200 dollars per week, and Meta has moved to cap internal AI token spending. When the biggest spenders start counting the bill, it is a sign the free-trial honeymoon is ending. The question stops being "can AI do this" and becomes "is it actually paying for itself."

What did the study actually find?

METR ran a randomized trial with experienced open-source developers working in large codebases they already knew well. For each task, a coin flip decided whether they could use AI tools (mostly modern assistants like Cursor with a frontier model) or not. Before starting, developers expected AI to speed them up by around 24 percent. Afterward, they believed it had sped them up by about 20 percent.

The stopwatch disagreed. Tasks done with AI took 19 percent longer on average.

Two things are worth saying clearly. First, this was a specific setting: senior engineers, mature projects, code they knew intimately. It is not proof that AI slows everyone down everywhere. Beginners, unfamiliar code, and boilerplate-heavy work often show real gains. Second, and more useful, the perception gap held even for skilled professionals paying close attention. If experts can feel a speedup that is not there, so can the rest of us.

Why did AI feel faster when it was slower?

Because AI changes the texture of the work, not just the speed. Waiting for a model to generate a draft feels lighter than writing from a blank page, even when you then spend longer reading, correcting, and re-prompting it. The effort moves from producing to reviewing, and reviewing feels easier while quietly eating the clock.

This is the trap a pragmatic operator should watch for. "It feels faster" is a real sensation and a useless metric. The same illusion shows up far outside coding: a support agent who lets AI draft every reply, a marketer who generates ten versions of a post then agonizes over which is least generic, a bookkeeper double-checking numbers a tool confidently made up. Motion is not the same as progress.

What does this mean for your business, not just developers?

Most businesses lose real money to manual, repetitive work they have stopped noticing. AI is genuinely good at that work. But the study is a reminder that the tool is rarely the bottleneck. The process is. Dropping a chatbot onto a broken workflow usually just adds a faster way to produce the wrong thing, plus a review step you did not have before.

The practical read: AI earns its keep on the boring, high-volume 80 percent of a task, the part where a rough draft or a first pass saves real minutes. It struggles, and can actively cost you, on the judgment-heavy 20 percent where an expert would have been faster just doing it themselves. The skill is knowing which is which before you automate.

How do you actually measure whether AI is helping?

Measure the boring thing, and measure it against a real baseline. A simple version anyone can run:

Pick one repetitive task (drafting quotes, sorting inbound emails, first-pass data entry).
Time it for a week the old way. Write the number down.
Do it with AI for a week. Time that too, including the correcting and re-prompting, not just the generating.
Compare completion time and error rate, not how it felt.

That last point is the whole game. The METR developers would have sworn AI helped. Only the clock told the truth. If you cannot state, in minutes or errors, what a tool saved you, you do not yet know if it saved you anything.

Where does AI reliably save time?

It shines on high-volume, low-stakes, forgiving work: summarizing long threads, turning messy notes into a clean draft, tagging and routing incoming requests, generating a first version you will edit anyway. These are reversible, and a human still ships the final output, so a mistake is cheap.

It gets expensive on the opposite: work that is irreversible, customer-facing, or demands deep context the model does not have. There, the review burden can outweigh the head start, exactly as the study found. Match the guardrail to the risk. Let AI move fast on the reversible stuff, and keep a human check on anything public or hard to undo.

The honest takeaway from a slightly cynical operator's chair: AI is a strong teammate for the repetitive 80 percent and a poor replacement for judgment. Start with one workflow, measure it against last week's real numbers, and keep only what visibly pays for itself. The companies now capping their AI bills are asking the same question you should: not "is this impressive," but "is this actually faster."

If you want a low-pressure starting point, pick your single most repetitive weekly task and time it once, by hand, before you automate anything. That number is the only honest benchmark you have.

Want this handled for you?

Odyssey builds AI-powered automation for Australian businesses. We map the workflow, build the system, and keep it running.

GET A FREE AUDIT →