What AI Exposed About Engineering Management
Behind the 'should EMs code?' debate, the data shows how the role actually changed and which managers got caught off guard.
Lately I have been hearing the same pattern from engineering leaders at different companies. An org rolls out AI coding tools across the board. Three months in, the EM walks into a quarterly review proud of one number. PR velocity is up two or three times. Two slides later, on a different dashboard, the bug count is up too. Sometimes by half. The numbers sit in different rooms of the same deck. Nobody connects them. The conversation moves on.
That gap, the one between the slide everyone celebrated and the slide everyone skipped, is the part of the AI conversation engineering leaders are mostly skipping over.
Most of the talk I see is stuck on a different question. Should engineering managers code? It was a contested question before AI showed up. Now it feels existential. Will Larson published “Good Engineering Management is a Fad” last October arguing that the industry’s preferences shift with business reality and dress the shift up as moral progress. Gergely Orosz has been writing for months about EMs perceived as technical landing jobs faster than EMs who are not. Charity Majors keeps making the case that the engineer-manager pendulum is now swinging on a shorter cycle. Each of them is right about what they are seeing, and what they are seeing is a symptom.
Coding was never the bottleneck for most EMs. Judgment was. The work that actually moved a team’s outcomes was deciding what was worth building and holding the line on what was good enough to ship. Writing code was the visible exhaust. Whether something shipped on time or quietly broke six things on the way out the door usually came down to the decisions, not the keystrokes.
What AI did is make the writing-code part cheap. Once that part is cheap, everything else stops being optional. That includes the parts of the EM job that have been substituted for by process for the better part of a decade.
The data backs this up in ways the marketing decks do not.
In July last year, METR ran the kind of study people in our field rarely run. Sixteen experienced open source developers, working on mature projects they already knew, ran 246 real tasks with and without modern AI tools. The headline finding was that AI made them about 19 percent slower. The harder finding was that the same developers believed they had been about 20 percent faster. A 39-point gap between perception and reality, in the people most likely to know better.
Faros AI’s longitudinal data on roughly 10,000 engineers tells the same story from a different angle. On the input side, pull request volume per developer is up 47 percent and merged PRs are up 98 percent. On the way through, code review time has climbed 91 percent and bugs per developer are up nine. The hours AI saves on the way in are getting eaten on the way out. LinearB’s 2026 benchmark across 8.1 million PRs found agentic AI PRs take more than five times longer to be picked up for review than human-written ones.
The 2025 DORA report lands in the same place. AI adoption now correlates with higher delivery throughput and lower delivery stability. Two trends at once, on the same teams. DORA’s framing is the most useful version of this I have read: AI is an amplifier of whatever a team is already doing, for good and for ill.
None of this is an anti-AI argument. The productivity story being told above the EM layer keeps diverging from the one showing up below it. Somebody has to hold the bar between the two, and the cycle for doing that just got a lot faster.
The shape of all this changes depending on the size of the company you sit in.
At a true early-stage startup, under twenty engineers, the question of whether the EM should code never really went away. The founder-CTO is in the codebase. The first engineering hires are in the codebase. The “manager” is usually the strongest senior engineer with a few extra responsibilities. AI in this environment is pure acceleration. One engineer who is good at directing AI ships what a small team used to ship. The startups pulling ahead right now tend to be the three-engineer kind running Claude Code in five terminals, doing what twenty-person teams used to do three years ago.
In a scale-up of a few dozen to a few hundred engineers, the picture is messier. This is where the engineer-manager pendulum has been most painful in the last twelve months. EMs hired in the 2018 to 2022 era were told their job was the people layer, not the code. Many of them built careers around being explicitly hands-off. Now their CTOs are asking them to be hands-on again, sometimes without saying it out loud. Some are thrilled, others are panicking. Stripe has started posting roles with titles like “Engineering Manager, Developer Productivity AI” which would have been unthinkable in 2021. The player-coach role is back, and it is back with a different set of tools than it had the last time.
Mid-market is where the layoff numbers actually live. The largest tech companies cut tens of thousands of corporate roles across 2025, and the memos named the cause directly. “Considerable de-layering” from one. “Fewer layers and more ownership” from another. A third trimmed roughly ten percent of its VP and manager roles in a single round. Read those memos carefully. The story they are telling is about coordinating layers that were too thick, with AI as the trigger that finally made cutting them defensible. The cause goes back further. A lot of mid-market orgs had been building management depth as a substitute for ownership.
Enterprise is the most counterintuitive case. The DX 2025 report found that large traditional enterprises are now leading on measured AI productivity, not lagging. The reason is unsexy. They have actual training programs, governance, and infrastructure that can absorb the change. They also have the most exposure to the DORA finding about lower stability. Higher throughput on top of weak CI discipline is just faster breakage. The most dangerous EM job in the world right now might be the one running a feature team inside a Fortune 500 with a velocity dashboard from 2019 and an AI rollout from last quarter.
Step back from the org chart and look at the METR perception gap again. Sixteen experienced engineers were systematically wrong about their own productivity by 39 points. They were doing what humans do when the work feels different from how it used to feel. The brain registers the new texture as speed, even when the clock disagrees. AI changes how writing code feels in a way that no productivity dashboard has caught up with.
If that gap is real, and the LinearB and Faros numbers suggest it is widespread, every engineering org needs someone who can tell the difference between “feels productive” and “is productive.” Whoever does this has to be close enough to the work to read the actual code and far enough back to see the second-order effects on the team. That role belongs to the EM. Engineers doing the work are sitting inside the perception gap. Executives two levels up are reading the same dashboard the EM put together.
The new EM job is holding the bar between what looks like progress on a dashboard and what is actually changing the company’s position in the market. The coding question, more or less, is a sideshow next to that.
This is harder than the version of the EM job most companies were set up for. Sprint planning is easier than judging a PR. Status meetings are easier than reading the code. A lot of the EMs being culled right now lost their jobs because the parts of the job that were always the real job, the judgment parts, had been substituted with calendar work for years, and the calendar work just got automated.
I have managed teams from both sides of this. There was a stretch in my career where I told myself I was being a better manager by staying out of the code, when most of what I was doing was avoiding the discomfort of judging work I no longer had the context to evaluate. That avoidance was always going to catch up with me. AI just shortened the timeline.
If you run an engineering team and want to take this seriously, the work is unglamorous. Read your team’s PRs yourself for the next two weeks. Not the merge notifications, the actual diffs. You will quickly learn which engineers are using AI to ship things they understand and which ones are using it to ship things they could not have written without it and cannot maintain after they did. While you are in there, put your bug rate on the same chart as your velocity. If both lines are climbing, the second one will eat the first one within a quarter or two. Better to see it early.
The bigger change is in what you measure. The metrics worth tracking now are the ones that survive contact with a customer: features that made it to production without a follow-up fix in the first week, incident resolution time, the share of last quarter’s roadmap that is still in the product six months later. PRs merged is mostly noise once PRs are nearly free to produce. And yes, probably write some code yourself. Reading what your team is shipping is becoming the only honest way to know what is happening on the team.
The engineering managers who are still in their jobs in two years will be the ones who can tell the difference between a team that is faster and a team that just feels faster. As the rest of the work gets cheap, that judgment is the part of the EM job that gets harder.



