Layer 6 — Human Oversight Interface

v2.4.0 | Report Errata

docs development docs development

Mandatory Review Workflow — Auto-Acceptance Prevention

The human oversight interface must enforce a review step before any system output is acted upon. For high-risk systems, auto-acceptance configurations, where the system’s outputs are applied without human review, must be technically prevented. This is an architectural constraint, not a policy constraint.

The deployment infrastructure is designed so that the only path from model inference to consequential action passes through the human review interface. There should be no API endpoint, configuration flag, or administrative override that allows system outputs to bypass human review. Penetration testing should specifically test for human oversight bypass paths.

This control directly implements Article 14’s requirement that high-risk AI systems be designed to be effectively overseen by natural persons. A system that offers oversight as an option but permits bypass under certain conditions does not meet this standard. The mandatory review workflow is documented in AISDP Module 7, including the technical mechanisms that enforce it and the testing performed to verify that bypass is not possible.

Key outputs

Architectural design ensuring all outputs pass through human review
Verification that no bypass paths exist (API, configuration, or administrative)
Penetration test results confirming bypass prevention
Module 7 AISDP documentation

Automation Bias Countermeasures (Data-First Display, Dwell Time, Calibration Cases)

Article 14(4) requires that oversight measures enable individuals, as appropriate to the circumstances, to “properly understand the relevant capacities and limitations of the AI system” and to “correctly interpret the system’s output.” An interface that presents the system’s recommendation with a prominent “Accept” button and a small “Override” link does not satisfy this requirement, even if the operator technically has override capability. Effective countermeasures against automation bias are rooted in human factors research.

Four specific techniques have evidence behind them. Data-before-recommendation display shows the underlying case data (the applicant’s profile, the patient’s history, the transaction details) before revealing the system’s recommendation, forcing the operator to begin forming their own assessment before being anchored by the system’s suggestion. Minimum dwell time prevents the operator from accepting the recommendation until a minimum period has elapsed, typically 15 to 60 seconds depending on case complexity, blocking rapid bulk-acceptance without review.

Confidence visualisation displays the system’s confidence level prominently, with uncertainty highlighted rather than hidden. A prediction at 52% confidence should look visually different from one at 98% confidence. Calibration cases are injected at random intervals, presenting the operator with cases where the correct answer is known (drawn from the golden test dataset) and recording whether the operator agrees with the system. Operators who agree with the system on cases where the system is wrong are exhibiting automation bias, and this signal feeds into operator training and oversight review.

Key outputs

Interface design implementing all four countermeasures
Dwell time configuration per case complexity tier
Calibration case injection schedule and golden dataset specification
Module 7 documentation of countermeasure design and rationale

Override Capability — Rationale Capture

Operators must have the ability to override any system recommendation. This capability is a core requirement of Article 14 and must be genuinely accessible, not hidden behind multiple menu levels or discouraged through interface design. When an operator exercises an override, the system captures a structured record of the event.

The override record includes the operator’s identity, the original system recommendation, the override decision (what the operator chose instead), and the stated rationale for the override. The rationale field should be structured to capture meaningful information; a free-text field that operators routinely fill with “disagree” provides little analytical value. Drop-down selections for common override reasons, supplemented by a free-text field for unusual cases, strike a practical balance.

Override records serve multiple compliance purposes. They contribute to the Article 12 audit trail documented in Module 10. They feed into override rate monitoring, which detects patterns indicative of automation bias or system degradation. They also provide evidence for the ongoing adequacy of human oversight: a consistently high override rate may indicate that the system’s recommendations are not meeting quality expectations, prompting investigation and potential model retraining.

Key outputs

Override interface design with structured rationale capture
Override record schema (operator, recommendation, decision, rationale)
Integration with logging infrastructure and Module 10 audit trail
Module 7 AISDP documentation

Override Rate Monitoring (Aggregate, Per-Deployer, Per-Operator)

The percentage of system recommendations that operators override is a signal of system health and oversight quality. The monitoring layer tracks override rates at three levels of granularity: aggregate (across all operators and deployers), per-deployer (to identify deployer-specific patterns), and per-operator (to identify individual operators who may be exhibiting automation bias or, conversely, overriding excessively).

Consistently low override rates may indicate automation bias: operators are accepting the system’s recommendations without meaningful review. Suddenly increasing override rates may indicate that the system’s outputs are degrading. Divergent rates between operators or deployers may reveal training gaps, differing operational contexts, or inconsistent application of the review workflow.

Review time monitoring complements override rate monitoring. Average review time per case is a proxy for review thoroughness. Operators consistently reviewing cases in under 60 seconds are unlikely to be performing meaningful oversight; this threshold is flagged for investigation. Together, override rates and review times provide a comprehensive picture of whether the human oversight measures documented in AISDP Module 7 are functioning as intended in practice. The results feed into Module 12 as post-market monitoring evidence.

Key outputs

Override rate dashboards at aggregate, per-deployer, and per-operator levels
Review time monitoring with sub-60-second flagging
Alerting thresholds for abnormal override rates and review times
Module 7 and Module 12 AISDP evidence

Review Time Monitoring — Sub-60-Second Flagging

Average review time per case serves as a proxy for review thoroughness. Operators who consistently review cases in under 60 seconds are unlikely to be performing meaningful oversight, regardless of how many cases they process. This metric complements the override rate monitoring described in and together the two provide a comprehensive picture of whether human oversight is functioning as intended.

The monitoring layer tracks per-operator review times and flags cases where the elapsed time between case presentation and operator action falls below the 60-second threshold. The threshold may be adjusted based on case complexity; straightforward cases may legitimately require less time, while complex cases should require more. The system should categorise cases by complexity tier and apply tier-appropriate review time thresholds.

When sub-threshold review times are detected, the investigation may reveal legitimate explanations (experienced operators making rapid but well-informed decisions) or concerning patterns (operators bulk-accepting without review). The response depends on the finding: additional training, interface redesign, workload adjustment, or, in persistent cases, escalation through the governance framework. Review time monitoring results are documented in AISDP Module 7 and Module 12 as evidence that the organisation actively monitors the quality of human oversight.

Key outputs

Per-operator review time tracking with sub-60-second flagging
Complexity-tiered review time thresholds
Investigation procedures for flagged patterns
Module 7 and Module 12 AISDP evidence