The fairness evaluation suite integrates all five post-training metrics into a single evaluation report that runs as part of the CI pipeline. The AISDP documents the tooling selected, its configuration, and the integration into the development workflow.
Fairlearn’s MetricFrame is the most flexible tool for this purpose. The developer defines the metrics, the sensitive features, and the dataset; MetricFrame produces a structured report with per-subgroup values for all metrics. It supports intersectional analysis by accepting multiple sensitive features and computing metrics for every combination. Fairlearn integrates with scikit-learn estimators and supports both evaluation (MetricFrame) and mitigation (ExponentiatedGradient, ThresholdOptimizer).
AI Fairness 360 (IBM) provides a broader toolkit with additional bias detection and mitigation algorithms, including the disparate impact remover, reweighting preprocessor, and calibrated equalised odds post-processor. It also includes dataset bias metrics that complement the pre-training analysis.
The fairness evaluation report is stored as a Module 4 and Module 5 evidence artefact. It is compared against the declared thresholds established in Article 81. Any threshold breach blocks deployment through the fairness gate in the CI/CD pipeline. The tooling configuration (metric definitions, sensitive feature specifications, threshold values) is version-controlled alongside the model code, ensuring reproducibility.
Key outputs
- Fairness tooling selection and configuration documentation
- CI pipeline integration specification
- Sample fairness evaluation report