Human Oversight Monitoring

v2.4.0 | Report Errata

docs operations docs operations

Override Rate Analysis Override rates carry compliance significance in both directions. A consistently low rate (below 2–5%) may indicate automation bias: operators accepting recommendations without meaningful scrutiny. A consistently high rate (above 20–30%) may indicate the system is underperforming or operators disagree with the model’s logic. Neither extreme is healthy. The PMM plan defines an expected override rate range based on the system’s documented accuracy and the decision context. Both upper and lower threshold breaches generate alerts. Override analysis is disaggregated by operator (identifying individuals needing additional training), by decision type (identifying categories where the system underperforms), by time period (detecting trends such as declining override rates indicating growing automation bias), and by deployer (identifying deployment-specific issues). A declining override rate trend over months is a particularly significant signal: as operators become accustomed to the system, they may progressively reduce their engagement, eroding the human oversight the AISDP documents. Key outputs

Expected override rate range with upper and lower thresholds
Disaggregation by operator, decision type, time period, and deployer
Declining trend detection for automation bias
Module 7 and Module 12 AISDP evidence

Review Time Analysis The time operators spend reviewing each case before accepting or overriding is a proxy for the depth of human engagement. Operators who consistently review cases in under five seconds for decisions documented as requiring substantive analysis are unlikely to be performing meaningful oversight. The Technical SME monitors review time against a minimum threshold defined in the PMM plan, based on the complexity of the decision and the information the operator must evaluate. Review time distribution is as informative as the average. A bimodal distribution, where most cases are reviewed in seconds but a small proportion take several minutes, may indicate that operators skim the majority and only engage deeply with cases that trigger an intuitive concern. This pattern leaves the organisation exposed to errors in the “skimmed” cases. The minimum review time threshold is calibrated through operational research: domain experts assess the minimum time needed to meaningfully review the information presented in the oversight interface for a typical case. This calibration should be documented and reviewed annually. Key outputs

Minimum review time threshold based on decision complexity
Distribution shape analysis (bimodal detection)
Calibration through domain expert assessment
Module 7 and Module 12 AISDP evidence

Escalation Monitoring The Technical SME tracks escalation frequency over time and disaggregated by escalation reason. A decline in escalation frequency may indicate operators are more confident, the system is improving, or escalation is perceived as burdensome and is being avoided. The PMM plan defines a baseline escalation rate and requires investigation of sustained deviations. Escalation reasons are categorised: uncertainty about the output, suspected error, novel input pattern, ethical concern, system malfunction. Trend analysis by category provides more actionable insight than aggregate frequency. A declining escalation rate for “suspected error” is positive (suggesting improved model accuracy); a declining rate for “ethical concern” may warrant investigation (are ethical concerns declining, or is the escalation pathway for ethical concerns not working?). The escalation monitoring results feed into the quarterly PMM review, where the AI Governance Lead assesses whether the escalation framework is functioning as intended. Key outputs

Escalation frequency tracking over time
Per-reason categorisation with trend analysis
Baseline escalation rate with deviation investigation
Quarterly governance review integration

Automation Bias Detection Beyond override rates and review times, more granular indicators detect automation bias. If the system presents a confidence score alongside its recommendation, operators who override high-confidence recommendations at the same rate as low-confidence ones are likely not using the confidence information. If the system provides explanatory features (the top contributing factors to the recommendation), operators who override without engagement patterns suggesting they have read the explanation may be accepting recommendations on face value. The Technical SME computes these behavioural indicators where the human oversight interface captures sufficient interaction data. The correlation between system confidence and override rate should be positive: operators should override less when the system is more confident and more when confidence is lower. A flat correlation indicates that operators are not engaging with the confidence signal. Automation bias indicators are reported in the quarterly PMM review and tracked over time. A rising automation bias trend triggers operator retraining, interface redesign (to make confidence and explanation information more prominent), or workload adjustment. Key outputs

Confidence-override correlation analysis
Explanation engagement pattern detection
Quarterly reporting with trend tracking
Remediation through retraining, interface redesign, or workload adjustment

Operator Wellbeing & Workload Parameters Article 14 compliance depends on operators who are alert, motivated, and capable of exercising independent judgement. Monitoring tracks workload indicators: cases per operator per shift, shift duration, break frequency, and overtime hours. Cognitive fatigue degrades oversight quality. The PMM plan defines maximum workload parameters based on decision complexity and the organisation’s assessment of sustainable oversight capacity. An operator who has reviewed three hundred cases in a single shift is less likely to catch a subtle error in case three hundred and one than an operator who has reviewed thirty. Workload thresholds trigger alerts when exceeded, and the AI Governance Lead has authority to reduce case volumes or increase staffing. Operator wellbeing monitoring also tracks secondary indicators: error rates in human review (errors identified during quality assurance checks on operator decisions), voluntary rotation requests, and absenteeism patterns. A sustained increase in these indicators may signal that the oversight workload is unsustainable. Key outputs

Maximum workload parameters (cases per shift, duration, breaks)
Workload threshold alerts with AI Governance Lead authority to adjust
Secondary wellbeing indicators (error rates, rotation requests, absenteeism)
Module 7 and Module 12 AISDP documentation