Measure · A metric I propose

Override Validity

It is not enough that a human overrules the machine. The override has to be right. A reviewer who confidently overturns correct decisions is not oversight, just a different source of error wearing the costume of judgment.

Manj Chenna · Founder, Sanctity · Building human judgment infrastructure · Amsterdam

Override Validity is a proposed measure, offered at v0.1, that asks a simple follow-up to any override: was the human right? Meaningful human oversight needs people who change wrong decisions, but it also needs them to be changing the right ones. An override rate can look healthy and still be noise, if the human is overturning good calls as often as bad ones. Validity is the check that keeps the override rate honest, and it belongs in any serious human judgment infrastructure.

Why a high override rate is not automatically good

Because overriding can be its own mistake. If you reward reviewers for intervening, you get interventions, some of them wrong. The honest counterpoint is that some domains genuinely need many overrides, and that is fine. The problem is only when the overrides do not track the truth, when the human reaches in often and is correct no more than chance. Then the oversight is adding motion, not judgment.

How you would check it

Where outcomes are eventually knowable, compare them: of the cases a human overrode, how often did the human's call prove better than the machine's would have been. It is not always cheap to measure, and I offer it as a proposal rather than a finished instrument. But even estimated, it answers a question the override rate alone cannot.

Read on

Its primary companion is the Meaningful Override Rate, and its shadow is the Rubber-Stamp Rate. The full set is in how to measure human oversight.