What Happens After You Find the Foreman
What the role actually looks like in practice, and why the job description you're imagining is probably wrong
The hardest part of building a new role isn't knowing what you need. It's that every existing category is wrong.
A few years back, I was trying to hire someone into a function with no real precedent. I could describe the outcome clearly: someone who could sit between a technical team and a set of business workflows evolving faster than the documentation. I interviewed qualified people. None of them fit. Every candidate had either half the skills or the wrong mental model, and when I described the role to them, I'd watch something go slightly wrong in their eyes, as if I were explaining a color they hadn't seen before. Eventually I realized the problem wasn't the candidates. The job required someone to think in a way their previous experience hadn't asked for, and I was looking for that quality in the wrong places.
Building a new role is a different problem from filling one. And we're about to run that problem at scale.
A few weeks ago, I wrote about what I called the Foreman Problem: AI is creating a new management role in technology organizations, the same way factories created the foreman and digital products created the product manager. I described the gap. This piece is about what goes in it.
The organizations I find most useful to study are the ones that didn't just drop AI into their existing workflows. A pharmaceutical company with about 5,800 employees built over 3,000 custom AI tools, then gave managers a framework for assessing every task: automate, augment, or keep with a person. That sounds like a process question, but it's a skill. Someone makes those calls constantly, and the calls change as the work changes and as AI capabilities change with it.
One manufacturing company uses agents to resolve supply chain delays before the morning shift arrives. That means someone set the boundaries for when the agent acts and when it escalates, then kept watching those boundaries as the agent's reliability built up. An enterprise technology company redesigned its performance review process around four specialized AI sub-agents, and their leadership was explicit: the win came from redesigning the process, not from adding AI to the existing one.
In each case, there's a person in the middle whose primary job is managing the boundary between what agents do and what humans do. And that job requires something a traditional engineering manager doesn't get much practice at.
Traditional management assumes that skilled people signal uncertainty. A senior engineer who's unsure about an edge case asks a question, adds a code comment, or flags the risk in a planning conversation. The skill lives in the person, and so does the acknowledgment of its limits. AI agents produce confident output whether they're confident or not. The difference between a correct answer and a plausible wrong one can be invisible to anyone who isn't specifically looking for it.
The skill I'd point to first is trust calibration: knowing when to let an agent run without review and when to require a human gate, and knowing how to move that boundary as experience accumulates. It's closer to the judgment a good test engineer develops about which automated tests can be fully relied on versus which ones need a manual pass before a production push. A judgment that sharpens over time, but only if you're paying attention to the right signals.
Something I found hard to admit earlier in my career: accountability doesn't shrink because visibility does. A manager who lets an AI agent run on consequential work owns the output even if they never read it. A legal ruling last year held an airline responsible for its chatbot giving incorrect bereavement fare information to a customer, rejecting the argument that the chatbot was somehow separate from the company. That ruling will not be the last. The person building this role needs to be clear-eyed about the gap between what they can physically review and what they're responsible for, because that gap is large and it grows as agent autonomy expands.
On any given day, the first question is which agent outputs need a human to confirm before they go anywhere, and which processes have accumulated enough reliable runs that the gate can loosen. Somewhere in between are the edge cases the agent still handles badly in ways that are easy to miss. In the pharmaceutical company's deployment, managers started by doing this manually, task by task. Over time they built patterns. But the judgment about when a pattern was stable enough to trust never became fully automatic.
The managers getting this right don't inherit workflows and drop AI into them. They redesign the work first: what happens when the agent is wrong, not just when it's right, and what the fallback looks like before anyone needs it. The verification layer is a permanent feature of the job.
And some of it is invisible. A good foreman produces a week where nothing goes visibly wrong. Confident wrong output stays out of customer hands. Fabricated records don't reach the database. The data a process runs on doesn't drift for three cycles before anyone checks. That absence is the work, and it's almost impossible to evaluate from the outside, which is part of why organizations are slow to create the role deliberately.
When a review process finds problems and they're all surface errors, that's table stakes. The harder signal is whether the process is finding the deeper failures: cases where an agent produced something that looked correct but was wrong in a way that mattered. A manager catching only obvious mistakes is probably missing the subtle ones. Not because they're careless, but because the review process wasn't designed to find them.
The trust boundaries should also be moving. A manager who locks down agent autonomy and never expands it isn't calibrating. A manager who expands autonomy without a principled reason is taking risk they probably don't see. The healthy pattern is steady accumulation of justified confidence, with visible reasoning for each expansion and a willingness to contract when something goes wrong.
I've watched organizations flatten their management layers on the assumption that AI handles coordination now. About 40% of employees at companies that have done this report feeling directionless. The continuous judgment work went quietly with the middle managers, constant small calibrations about what to trust, what to verify, what to catch before it became visible.
The role I'm describing sits outside what most engineering management training covers. Trust calibration, process design under uncertainty, outcome ownership with partial information: none of those are in the standard curriculum. They can be developed. But you have to know to look for them, and most organizations are currently searching in the wrong places.
If you're leading a team that includes AI agents, start with a single audit: what ran autonomously this week, and what's your actual evidence that it ran well. If the audit is "I didn't hear about any problems," that's not evidence.



