AI Won't Lower Your Engineering Bar. Poor Adoption Will.

What nine recent studies reveal about code quality, delivery stability, and leadership accountability when engineering teams adopt AI coding tools.

Mar 31, 2026

A few months back I was in a retrospective where an engineering lead walked through the last sprint’s SLA misses. The deployment window had slipped. Two incidents had hit production. A planned release had been rolled back within hours. When we got to root cause, I heard something I’ve been thinking about since: “The team was relying too heavily on AI-generated code and didn’t catch the issues in time.”

The room went quiet. A few heads nodded. And just like that, the tool got the blame, and the team got an implicit pass.

0I’ve been in engineering leadership long enough to recognize what that moment actually was. Not an honest postmortem. A deflection dressed up as one.

When leadership asks engineering teams to adopt AI coding tools, the message often gets received as: move faster, write less, do more with less effort. That reading is almost always wrong, and leaders who allow it to persist are setting their teams up for exactly the kind of failures that then get pinned on the technology.

When the ask is a serious one, it’s to be more of an engineer. To own architecture, judgment, tradeoffs, and quality at a level that frees you from writing every line from scratch. The tool handles execution. You handle the thinking. That is a harder job, not an easier one.

Somewhere between the executive announcement and the sprint planning meeting, that distinction collapses. And then the data catches up.

GitClear analyzed 211 million changed lines of code across repositories from 2020 to 2024, including work from Google, Microsoft, and Meta [1]. What they found is uncomfortable to read if you’ve been cheerfully citing productivity gains. Code churn (lines revised or reverted within two weeks of being authored) is on track to double versus its pre-AI baseline. Refactored code, the kind that signals a team actively improving what they’ve built, dropped from 25% of changes in 2021 to under 10% in 2024. Copy-pasted code blocks grew from 8.3% to 12.3% in the same window. Eight-fold increase in duplicate code blocks over a four-year period [1].

This is what “moving faster” looks like when the person generating the code isn’t fully owning what they generate.

SonarSource found a 9% increase in bugs in AI-accelerated codebases and pull requests that are 154% larger than what teams were producing before [2]. Pull requests per developer went up roughly 20% with AI assistance, but incidents per pull request increased by 23.5% [2]. Google’s 2024 DORA report put numbers on what this looks like at the delivery level: a 25% increase in AI usage corresponds to a 7.2% decrease in delivery stability and a 1.5% drop in throughput [3].

More code. More PRs. More churn. More incidents. Less stability.

None of this is the AI’s fault. The AI did exactly what it was told.

Ox Security ran an analysis of more than 300 production repositories in 2025 and found ten recurring patterns in AI-generated code present in 80 to 100% of what they reviewed: incomplete error handling, weak concurrency management, inconsistent architecture [4]. These aren’t bugs you miss at 2am when you’re tired. These are engineering decisions that were never made, skipped because the code appeared to work, shipped because the test passed, and discovered when something went wrong downstream.

That’s the accountability gap. And it doesn’t belong to the AI.

Ox’s findings landed alongside a separate Georgetown CSET analysis from November 2024 that catalogued the security vulnerability profile of AI-generated code across the major providers [5]. The word “BLOCKER” appeared often. AI-generated code requires the same rigorous review as any code, and in practice it’s getting less of it because the generation itself feels like the hard part was already done.

The hard part is never the generation. It never was.

Blaming AI for SLA degradation is the new “I told you so.” It carries the same structure: a skeptic who never fully bought in, an early miss that confirms their doubts, and a narrative that quietly ends the conversation before anyone has to examine what actually happened. What actually happened is usually simpler and more uncomfortable. The team used the tool without a clear owner for quality. Review standards didn’t adjust for the new volume. Test coverage didn’t expand to match larger changesets. No one explicitly said what “good AI-assisted engineering” looked like, so the default was whatever worked. If it compiled and the tests passed, it shipped.

AI did its job. The ownership question went unanswered.

One pattern I’ve had to correct in myself over the years is assuming that handing someone a powerful tool is the same as setting expectations around how to use it. It isn’t. A faster car needs better driving. More code generation requires sharper review. The two sides of that equation need to move together, and when they don’t, the quality bill comes due on a timeline you can’t predict and can’t really argue with.

The harder conversation, the one that gets avoided, is about what leadership owes the team during an adoption period.

The 2025 LeadDev Engineering Leadership Report surveyed more than 600 engineering leaders. Sixty percent of them said AI hadn’t meaningfully boosted team productivity. The majority reported only small gains [6]. That number tells you something important: most teams are in the difficult middle of a capability transition, not on the other side of it. And yet leadership behavior in many of those organizations suggests the opposite assumption. The tools are deployed, so the results should follow.

They don’t. At least not right away.

McKinsey’s 2025 workplace research found that C-suite leaders dramatically underestimate how extensively their employees are already using AI, but they also overestimate how quickly that use translates into delivery outcomes [7]. One figure from that work: 62% of organizations report productivity increases of 25% or more from AI, but only 20% of engineering teams are actually using metrics to measure AI’s impact [6]. Two thirds of companies claim results they can’t prove, from a transition they haven’t instrumented.

That gap has consequences for the people doing the work. Engineering managers are reporting 12 to 15 hour workdays following AI tool rollouts, as sprint expectations get inflated 30 to 40% based on vendor ROI claims while actual utilization of the tools drops to around 22% within the first 30 days [8]. The team is drowning in delivery pressure while they’re still figuring out the toolchain. Licenses were purchased, expectations were raised, and the timeline for when results would actually show up was never discussed.

About eight months into one rollout I was involved with, a VP of Product pulled me aside after a planning meeting. The delivery numbers hadn’t moved. He asked, with a mix of frustration and genuine curiosity, whether the AI tools were actually helping or whether we’d wasted the budget. I told him something like: we’re in month eight of a twelve-month transition. The team is learning a different way to work. You’ll see the movement, but not yet. He looked at me the way people look at you when they’re not sure if you believe what you’re saying.

I believed it. But I also knew the conversation was happening because no one had set that expectation before the rollout started.

If you push an engineering team to adopt AI and expect delivery performance to improve immediately, you’ve misread the transition. What actually happens is what Google’s DORA team described as the bottom of a J-curve. Performance gets worse before it gets better, as teams rebuild mental models, review processes, and quality gates around a different way of generating code [3]. Expecting the upward curve without accepting the downward one isn’t optimism. It’s a refusal to understand change.

I’ve watched this play out across multiple transformations. Every time an organization tried to mandate a new way of working without building space for the learning period, the team found ways to comply on the surface and protect themselves underneath. Metrics looked fine. The actual change didn’t happen. And when something broke, someone found a credible-sounding external cause.

AI is now that external cause. Available, plausible, and completely unable to defend itself.

The organizations where AI adoption is working share a pattern that’s less flashy than the productivity numbers in press releases. Accenture’s deployment of GitHub Copilot across a large developer population found that over 80% of participants successfully adopted the tool, with a 96% success rate among the initial cohort [9]. Ninety percent of developers reported higher job fulfillment. But what made that work wasn’t the tool. It was structured adoption with explicit expectations, measurement, and time built in for the learning curve to run its course.

The Accenture work confirmed something the skeptics don’t advertise: when you actually manage the transition, the results are real. Code readability up. Maintainability up. Code approval rates up [9]. Not because AI made engineering easier, but because the engineers who used it well used it as an amplifier for their judgment, not a substitute for it.

The controlled research backs this up too. Developers completing structured tasks with Copilot finished 55% faster in some studies [9]. Pull request cycle time dropped from 9.6 days to 2.4 in one tracked deployment. Those numbers are real. They’re also downstream of something less visible: engineers who understood exactly what they were responsible for, and used the tool accordingly.

Those outcomes trace back to one thing: clear ownership of quality, regardless of where the code came from.

There is a version of this conversation that gets had in every industry going through a capability shift. I watched it with test automation. I watched it with cloud migration. Teams that treated the new capability as a way to do the same job faster eventually hit a ceiling. Teams that treated it as a reason to redefine what the job meant pulled ahead and didn’t come back.

Writing less code was never the point. The job is to carry more judgment about what gets built, what gets reviewed, what gets deployed, and what the consequences are. That’s a bigger ask. It requires more professional ownership, not less.

The teams that understand this are the ones where engineers look at AI output the way a senior engineer looks at a junior’s PR, with curiosity, with scrutiny, with clear standards. Those are the teams where the J-curve turns. Somewhere around six months in, based on what I’ve seen, that’s when the upward movement starts to show.

The ones still waiting for the curve to turn on its own are usually the ones with a leader who announced the tools, expected the numbers, and is now looking for someone to blame.

Quietly, that leader is the answer to their own question.

References

[1] GitClear. “Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality (Including 2024 Projections).” GitClear Research, January 2024.

[2] SonarSource. Cited in: Smith, B. “AI in Software Development: Productivity at the Cost of Code Quality?” DevOps.com, 2025.

[3] Google Cloud. “Accelerate State of DevOps Report 2024.” DORA Research Program, 2024.

[4] Ox Security. Analysis of AI-Generated Code Anti-Patterns in Production Repositories. Ox Security Research, 2025.

[5] Center for Security and Emerging Technology (CSET), Georgetown University. “Cybersecurity Risks of AI-Generated Code.” Issue Brief, November 2024.

[6] LeadDev. “Engineering Leadership Report 2025.” LeadDev, 2025.

[7] McKinsey & Company. “Superagency in the Workplace: Empowering People to Unlock AI’s Full Potential at Work.” McKinsey Global Institute, 2025.

[8] Earezki, A. “Managing the Gap: Why Engineering AI Adoption Leads to Developer Burnout.” Dev|Journal, March 2026.

[9] GitHub. “Research: Quantifying GitHub Copilot’s Impact in the Enterprise with Accenture.” GitHub Blog, 2024.

Discussion about this post

Ready for more?