Fairness Monitoring

v2.4.0 | Report Errata

docs operations docs operations

Fairness Metrics Selection rate ratios, equalised odds, predictive parity, and calibration within groups are computed by the Technical SME on production data at defined intervals. Selection rate ratios measure whether the system’s positive outcome rate differs across protected groups. Equalised odds measures whether the true positive and false positive rates are consistent across groups. Predictive parity measures whether the positive predictive value is consistent. Calibration within groups measures whether predicted probabilities correspond to observed frequencies for each group. These metrics are computed for single protected characteristics and, where cell sizes are sufficient, for intersectional subgroups. The AISDP declares the primary fairness metrics and the minimum acceptable ratios; PMM monitors compliance with these declarations. Fairness metric computation in production uses the same methodology as the pre-deployment fairness evaluation, ensuring that production fairness is directly comparable to the validated baseline. Key outputs

Four fairness metric families computed on production data
Single-characteristic and intersectional computation
AISDP-declared minimum ratios monitored
Methodology consistent with pre-deployment evaluation

Computation Intervals & Intersectional Subgroups Fairness metric computation intervals depend on the system’s volume and risk profile. High-volume systems (processing thousands of decisions daily) may compute fairness metrics weekly; lower-volume systems may require monthly intervals to accumulate sufficient data for statistically meaningful computation. The PMM plan documents the computation interval for each metric with the statistical justification. Intersectional subgroup analysis (for example, age and gender combined, or ethnicity and disability status combined) requires larger sample sizes than single-characteristic analysis. The PMM plan specifies the minimum cell size below which intersectional metrics are flagged as inconclusive. A common threshold is 30 observations per cell, though higher thresholds may be appropriate for metrics with higher variance. Where intersectional analysis is not feasible in production due to sample size constraints, the Technical SME compensates through periodic batch analysis on accumulated data, synthetic data augmentation for sensitivity analysis, or targeted deployer surveys. Key outputs

Computation interval per metric with statistical justification
Minimum cell size for intersectional analysis (typically 30+)
Inconclusive flagging for insufficient samples
Compensating strategies for small intersectional cells

Compensating for Missing Demographic Data In many deployment contexts, demographic data about affected persons is not available to the provider. Compensating strategies include proxy-based estimation (with documented methodology and accuracy bounds), periodic deployer surveys or sampling studies, external benchmark comparison (comparing the system’s output distributions against known population distributions), and structured feedback analysis (examining complaint and appeal patterns for demographic signals). Each compensating strategy has limitations that the Technical SME documents. Proxy-based estimation introduces measurement error; the accuracy bounds must be reported alongside the estimated fairness metrics. Deployer surveys depend on deployer cooperation and may be subject to selection bias. External benchmarks may not reflect the system’s specific deployment population. Feedback analysis captures only the concerns of persons who complain, which may systematically exclude the most affected groups. The AISDP documents which compensating strategy is applied, its known limitations, and the confidence level of the resulting fairness estimates. Where no reliable fairness monitoring is achievable for a specific protected characteristic, this gap is documented as a residual risk in the risk register. Key outputs

Four compensating strategies with documented limitations
Accuracy bounds reported alongside estimated metrics
Residual risk documentation where no reliable monitoring is achievable
Module 12 AISDP documentation