Did a study really find AI made developers slower?

In a 2025 METR randomized trial, experienced developers working in codebases they knew well expected AI to speed them up by about 24 percent and believed afterward they were roughly 20 percent faster. Measured against the clock, they were about 19 percent slower with the AI tools. The point is the gap between felt speed and measured result, not that AI is useless.

Why does AI feel faster even when it is slower?

AI removes the unpleasant friction of starting, which feels like a win. It also shifts your work from producing to reviewing and re-prompting, and reviewing feels lighter than creating even when it takes longer. Confidence and speed are different measurements, and people routinely confuse the two.

How do I measure whether AI is actually saving my business money?

Pick one repetitive, countable task. Measure three numbers before and after: time per unit including review and fixes, error or rework rate, and total cost including the human minutes spent supervising the AI. If time drops, errors hold or fall, and cost is lower, it is a real win. If only the feeling improves, you bought a nicer experience, not a saving.

Where does AI genuinely pay off in a business?

On high volume, low judgment work such as sorting messages, drafting routine replies, and pulling data out of messy documents, where a quick human check is enough. It tends to disappoint on expert, high stakes, one off tasks, where it just adds a review step to work your best person already does well.

Why are companies like Tesla and Meta capping AI spending?

Reports indicate Tesla moved to cap staff AI spending around 200 dollars per person per week and Meta capped internal AI token spending. It signals that AI cost is real and visible while the return is often assumed. Much of the expense is a routing problem: sending every task to the most powerful model instead of using cheap models for routine steps and the expensive one only for output that ships.

← ALL ARTICLES2 July 2026

Developers Felt 20% Faster With AI But Were 19% Slower. Here's How to Actually Measure AI ROI

Did AI actually make those developers slower?

Yes, in one careful 2025 study it did. Researchers at METR ran a randomized trial with experienced open-source developers working in codebases they knew well. The developers expected AI tools to speed them up by around 24 percent. After the work, they believed they had been about 20 percent faster. When the researchers actually timed the tasks, the developers were roughly 19 percent slower with the AI tools than without them.

That gap between the felt speedup and the measured result is the whole story. The tool felt like a win. The stopwatch disagreed. If you are spending money on AI in your business right now, that gap is the most important thing to understand, because you are almost certainly judging your own AI by feel too.

Why would AI feel faster while being slower?

A few plain reasons, and none of them are that AI is useless.

First, AI removes the boring friction. Staring at a blank page is unpleasant. Getting a draft to react to feels great, even when polishing that draft to a usable state takes longer than writing it yourself would have. The relief is real. The time saved is not.

Second, the work changes shape. Instead of doing the task, you now review, correct, and re-prompt. That reviewing feels lighter than producing, so it reads as speed, but it is still time on the clock.

Third, the study looked at experts in familiar territory. That matters. AI tends to help most where you are slow, unsure, or working in something unfamiliar. It helps least where you are already fast and know exactly what good looks like. In those cases the AI just adds a round trip.

The lesson is not "AI is hype." It is that your gut is a broken gauge. Confidence and speed are not the same measurement, and humans routinely confuse the two.

What does this mean for a normal business, not a software team?

Most owners are not running coding trials. They are letting staff use ChatGPT for emails, letting a tool draft proposals, or paying for an AI feature bolted onto software they already have. The same trap applies. Everyone reports that it feels faster. Almost nobody has checked.

Big companies are now checking, and getting nervous. Tesla reportedly moved to cap staff AI spending at 200 dollars per person per week. Meta has moved to cap internal AI token spending too. When the companies most enthusiastic about AI start putting a meter on it, that is a signal. The cost is real and visible. The return is fuzzy and assumed. That is an uncomfortable combination, and it is exactly the position a smaller business can drift into without noticing.

How do you actually measure AI ROI?

You measure the boring thing, before and after, on one task. Not a companywide rollout. One workflow you can put a number on.

Pick something repetitive and countable. Answering a common customer email. Turning a call note into a quote. Drafting a listing. Then capture three numbers:

Time. How long does one unit take today, start to finish, including the review and fixing? Time ten of them by hand. Do it honestly, including the parts you would rather skip.
Errors. How often does the output need rework, or go out wrong? AI that is fast but needs constant correction is not a saving, it is a tax you pay in a different currency.
Total cost. The subscription plus the human minutes spent supervising it. The supervision is the part people forget, and it is where the METR developers lost their time.

Then run the same task with AI for a week and measure the same three numbers. If time drops, errors stay flat or fall, and cost is lower, you have a real win worth expanding. If the numbers barely move but it feels better, you have bought a nicer experience, not a faster business. That is fine if you know that is what you bought. It is a problem if you thought you were saving money.

Where does AI genuinely pay off, then?

On the high volume, low judgment 80 percent. Sorting and tagging incoming messages. First drafts of routine replies. Pulling structured data out of messy documents. Work that is repetitive, tolerant of a quick human check, and currently eating hours you have stopped noticing.

Where it tends to disappoint is the expert, high stakes, one off work, the exact case in the study. Your most skilled person doing their most skilled task rarely needs a co-pilot for it. Pointing AI there adds a review step and calls it progress.

There is also a cost angle hiding in those corporate spending caps. A lot of "AI is too expensive" is really a routing problem. Use a cheap, fast model for the routine high volume steps, and reserve the expensive model for the small slice of output that actually ships to a customer. Sending every task to the most powerful model is how you end up needing a 200 dollar weekly cap in the first place.

What should you do this week?

Pick one AI use you already pay for. Time it properly against the manual version, count the errors, add up the true cost. Keep it if the numbers move, drop it if only the feeling does. The point of that developer study is not that AI failed. It is that the people using it could not tell. You can, if you measure instead of guess. That is the difference between AI as a line item and AI as a return.

Want this handled for you?

Odyssey builds AI-powered automation for Australian businesses. We map the workflow, build the system, and keep it running.

GET A FREE AUDIT →