Each bias mitigation technique applied (pre-processing, in-processing, or post-processing) requires an effectiveness assessment documenting whether it achieved its intended objective and what trade-offs it introduced.
The assessment compares the fairness metrics before and after mitigation, using the same evaluation methodology and the same test dataset. The comparison should cover all five post-training metrics, not only the targeted metric, since mitigation of one fairness dimension may adversely affect others. The accuracy impact is measured: what performance was lost in exchange for the fairness improvement? The subgroup-level impact is examined: did the mitigation improve fairness for the targeted subgroup without degrading performance for other subgroups?
Where multiple mitigation techniques were applied in combination, the assessment should decompose the contribution of each technique where possible, enabling the organisation to understand which techniques are most effective for its specific context. This information feeds into future model development cycles and into the broader organisational learning about bias mitigation.
Key outputs
- Per-technique effectiveness assessment
- Before/after fairness metric comparison
- Accuracy-fairness trade-off documentation