Essay · AI Governance

Human Oversight Is Mostly Theater

The EU AI Act made human oversight the law. Almost no one is measuring whether it's real. Here's the metric that would, and the bet I'm making on it.

Manj Chenna · Founder, Sanctity · Amsterdam · ~6 min read

Picture the human in the loop. A loan officer with a queue of four hundred applications and a model that scores each one in milliseconds. A radiologist with a worklist and a tool that has already circled the suspicious tissue. A content moderator with three seconds per item and a classifier that has already decided. In each case there is a person, a screen, and a button that says approve. We call this oversight. Mostly, it is theater.

I don't mean the people are lazy or the builders are cynical. I mean the design is doing something other than what it claims. The system produces a decision; the human supplies a signature; the signature supplies accountability. The machine gets the speed and the human gets the blame. That is not a human in command. That is a human being used as upholstery for liability.

The law now requires the thing we aren't measuring

This stopped being an abstract worry in Europe. The EU AI Act, the world's first broad, binding law on artificial intelligence, devotes an entire article, Article 14, to human oversight of high-risk systems. It says, in effect, that a person must be able to understand the system, decide not to use its output, and intervene or stop it. It is the right instinct, written into law.

But the Act specifies the function, not the proof. It tells you a human must be able to override. It does not tell you how to know whether, in the field, with a real queue and a real clock, anyone ever does. And a capability nobody exercises is indistinguishable, from the outside, from a capability that was never there. Oversight without measurement isn't oversight. It's a clause.

Why the human goes quiet

We have decades of evidence for what happens next, long predating this generation of AI. Human-factors researchers gave it a name in the era of cockpit automation: automation bias, the tendency to defer to an automated source, to stop searching for disconfirming information once the machine has spoken. When a system is right most of the time, vigilance is expensive and usually wasted, so people ration it. They stop looking. The better the model, the faster the human checks out. This is not a character flaw; it is a predictable response to a system designed to make deference the path of least resistance.

The researcher Madeleine Clare Elish gave the social half of this a sharper name: the moral crumple zone. When an automated system fails, the human nearest the controls absorbs the impact, the blame, the liability, the inquiry, even though they had little real capacity to prevent it. Put those two ideas together and you get the failure mode exactly: a person positioned to be responsible for a decision they were structurally encouraged not to question.

A human who can technically say no, but never does, and gets blamed when the machine is wrong, is not oversight. It is a crumple zone with a job title.

Measure the "no"

If you want to know whether oversight is real, stop reading policy documents and start measuring behavior. Here is the instrument I argue for.

The metric

Meaningful Override Rate (MOR)

Of the decisions where a human could have changed the machine's output, what fraction did they actually change, and that change stood? Not clicks. Not time-on-screen. Reversals that the human initiated and the organization honored.

MOR is now an open standard, with the method and what makes a number valid. Read the spec →

MOR is deliberately narrow, and it needs two companions to be honest:

Contestation latency, how much time and context the human realistically has to disagree. Three seconds per item is not oversight at any override rate; it's a turnstile. Reversal validity, when humans do override, are they right more often than the model would have been? An override rate that is high but wrong is its own pathology.

Run these across a population and the theater becomes visible. If the meaningful override rate sits at roughly zero, across thousands of decisions, across operators, across months, the humans are not overseeing anything. They are authorizing. The system should not be allowed to call that "human oversight," any more than a restaurant should be allowed to call a photograph of a kitchen "freshly cooked."

The honest objection

Here is where most arguments like this cheat, so I won't. A near-zero override rate does not, by itself, prove the oversight is fake. It could mean the model is genuinely excellent and the humans are correctly leaving it alone. That is a real possibility, and MOR alone cannot tell the two apart.

So you don't use it alone. You pair it with seeded error: inject cases where you already know the model is wrong, known-hard examples, audited mistakes, adversarial cases, and measure whether the humans catch those. A good model with alert humans shows a low override rate overall but a sharp spike on the seeded errors. A theater of oversight shows a flat line through both. The instrument isn't the verdict; the instrument plus a controlled probe is. Measurement doesn't have to be naive to be useful, it just has to be designed by someone who actually wants the answer.

The hard part is the denominator

I'll be honest about where this gets difficult, because the difficulty is the actual work. The clean half of MOR is the numerator, count the overrides that stood. The whole fight is in the denominator: what counts as "a decision the human could have changed"? In a triage tool that routes only a slice of cases to a person, is it every model output, or only the routed ones? "The change stood" needs an arbiter and a clock, stood for how long, honored by whom, and what about an override that is itself reversed later? Define the denominator generously and every system looks attentive; define it strictly and almost none do. Pinning it down, per workflow, is most of the job.

The seeded-error probe has its own cost I won't wave away: in credit or healthcare, injecting known-wrong decisions into a live pipeline that touches real people is an ethics-board question, not a footnote, and the moment operators know a probe exists, their vigilance shifts, so the instrument can distort the very thing it measures. These aren't reasons to abandon measurement. They're the reasons it has to be designed by someone who actually intends to find the answer, and who will publish the number even when it's embarrassing. The next thing I'll show isn't another argument, it's MOR computed end-to-end on one real workflow, denominator defined, and the parts that broke.

The bet

I'll make this falsifiable, because a position that can't be wrong isn't worth holding. Within about three years, "show me your oversight numbers" becomes a procurement question, the way security questionnaires became one a decade ago. High-stakes buyers in credit, hiring, healthcare, and public benefits will start asking vendors for an override-and-contestation metric, and systems that can't produce one will quietly stop winning those deals. Oversight will ship with a label, the way food ships with nutrition facts.

I could be wrong, and I know exactly how: if regulators and buyers settle for process evidence, a documented procedure, a training log, a box ticked, instead of outcome evidence, then theater wins, because theater is cheaper and it satisfies a checklist. That is the fight. Not whether oversight is mandated; it already is. Whether we are willing to measure it, or content to perform it.

What this has to do with a grid of blinking cells

On my homepage there's a toy: Conway's Game of Life, where a handful of numbers, the rule, decide everything that emerges. The point of the toy is that in Conway's world you can see the rule. In a real model, you can't. So the only honest proxy we have for "is a human actually in command" is not the architecture diagram and not the policy PDF. It's whether the human's "no" changes the outcome. Measure the no. Everything else is set dressing.

This is the line my company won't move: we don't ship a decision a person can't refuse, and we don't remove the override, not for any contract. Not because it's noble. Because an override nobody can measure is the most elegant way ever invented to look accountable while being automatic.

Think I'm wrong?

Good, that's the point. If you build, buy, or regulate these systems, come argue with me.

Notes & sources

EU AI Act, Article 14, "Human oversight" requirements for high-risk AI systems (Regulation (EU) 2024/1689).
Automation bias, Parasuraman & Riley (1997), "Humans and Automation: Use, Misuse, Disuse, Abuse," Human Factors 39(2); and Mosier & Skitka (1996) on automation bias and complacency.
"Moral crumple zones", Madeleine Clare Elish (2019), "Moral Crumple Zones: Cautionary Tales in Human–Robot Interaction," Engaging Science, Technology, and Society 5.
The Meaningful Override Rate, contestation latency, and seeded-error probe described here are my framing, offered for argument, not a settled standard. Push on them.