Developers Felt 20% Faster With AI But Were 19% Slower. Here's How to Actually Measure AI ROI
Did AI actually make those developers slower?
Yes, in one careful 2025 study it did. Researchers at METR ran a randomized trial with experienced open-source developers working in codebases they knew well. The developers expected AI tools to speed them up by around 24 percent. After the work, they believed they had been about 20 percent faster. When the researchers actually timed the tasks, the developers were roughly 19 percent slower with the AI tools than without them.
That gap between the felt speedup and the measured result is the whole story. The tool felt like a win. The stopwatch disagreed. If you are spending money on AI in your business right now, that gap is the most important thing to understand, because you are almost certainly judging your own AI by feel too.
Why would AI feel faster while being slower?
A few plain reasons, and none of them are that AI is useless.
First, AI removes the boring friction. Staring at a blank page is unpleasant. Getting a draft to react to feels great, even when polishing that draft to a usable state takes longer than writing it yourself would have. The relief is real. The time saved is not.
Second, the work changes shape. Instead of doing the task, you now review, correct, and re-prompt. That reviewing feels lighter than producing, so it reads as speed, but it is still time on the clock.
Third, the study looked at experts in familiar territory. That matters. AI tends to help most where you are slow, unsure, or working in something unfamiliar. It helps least where you are already fast and know exactly what good looks like. In those cases the AI just adds a round trip.
The lesson is not "AI is hype." It is that your gut is a broken gauge. Confidence and speed are not the same measurement, and humans routinely confuse the two.
What does this mean for a normal business, not a software team?
Most owners are not running coding trials. They are letting staff use ChatGPT for emails, letting a tool draft proposals, or paying for an AI feature bolted onto software they already have. The same trap applies. Everyone reports that it feels faster. Almost nobody has checked.
Big companies are now checking, and getting nervous. Tesla reportedly moved to cap staff AI spending at 200 dollars per person per week. Meta has moved to cap internal AI token spending too. When the companies most enthusiastic about AI start putting a meter on it, that is a signal. The cost is real and visible. The return is fuzzy and assumed. That is an uncomfortable combination, and it is exactly the position a smaller business can drift into without noticing.
How do you actually measure AI ROI?
You measure the boring thing, before and after, on one task. Not a companywide rollout. One workflow you can put a number on.
Pick something repetitive and countable. Answering a common customer email. Turning a call note into a quote. Drafting a listing. Then capture three numbers:
- Time. How long does one unit take today, start to finish, including the review and fixing? Time ten of them by hand. Do it honestly, including the parts you would rather skip.
- Errors. How often does the output need rework, or go out wrong? AI that is fast but needs constant correction is not a saving, it is a tax you pay in a different currency.
- Total cost. The subscription plus the human minutes spent supervising it. The supervision is the part people forget, and it is where the METR developers lost their time.
Then run the same task with AI for a week and measure the same three numbers. If time drops, errors stay flat or fall, and cost is lower, you have a real win worth expanding. If the numbers barely move but it feels better, you have bought a nicer experience, not a faster business. That is fine if you know that is what you bought. It is a problem if you thought you were saving money.
Where does AI genuinely pay off, then?
On the high volume, low judgment 80 percent. Sorting and tagging incoming messages. First drafts of routine replies. Pulling structured data out of messy documents. Work that is repetitive, tolerant of a quick human check, and currently eating hours you have stopped noticing.
Where it tends to disappoint is the expert, high stakes, one off work, the exact case in the study. Your most skilled person doing their most skilled task rarely needs a co-pilot for it. Pointing AI there adds a review step and calls it progress.
There is also a cost angle hiding in those corporate spending caps. A lot of "AI is too expensive" is really a routing problem. Use a cheap, fast model for the routine high volume steps, and reserve the expensive model for the small slice of output that actually ships to a customer. Sending every task to the most powerful model is how you end up needing a 200 dollar weekly cap in the first place.
What should you do this week?
Pick one AI use you already pay for. Time it properly against the manual version, count the errors, add up the true cost. Keep it if the numbers move, drop it if only the feeling does. The point of that developer study is not that AI failed. It is that the people using it could not tell. You can, if you measure instead of guess. That is the difference between AI as a line item and AI as a return.
Want this handled for you?
Odyssey builds AI-powered automation for Australian businesses. We map the workflow, build the system, and keep it running.
GET A FREE AUDIT →