Measure · Metrics I propose
Human response-time metrics
Reliability teams never settled for one timing number, and neither should oversight. How long until a human is reached, how long until they respond, and how long they actually spend deciding are three different questions with three different answers.
Human oversight needs the timing vocabulary that reliability engineering already trusts, pointed at people instead of servers. I propose three measures, at v0.1: Mean Time to Human, Human Response Time, and Human Deliberation Time. Together they answer how fast oversight actually arrives and whether anyone does anything once it does. They are companions to Time-to-Human, and like every figure in this work they are offered as proposals, with no target number attached until real data earns one.
The three clocks
- Mean Time to Human. The average time from an agent flagging a decision to an accountable human acting on it. The headline, borrowed in spirit from mean time to repair.
- Human Response Time. The time to the first human response after an escalation, the analog of a first-response time.
- Human Deliberation Time. The time the human actually spends weighing the call, not merely acknowledging it. The one that separates judgment from a reflex.
Why three, not one
Because a single number hides the failure. A fast response with near-zero deliberation is a rubber stamp wearing a good metric. Pulling the clocks apart shows whether speed came from a real decision or a quick click, which is exactly the difference oversight is supposed to track.
Read on
See the headline measure, Time-to-Human, and the full set, how to measure human oversight.