Distributional Analysis — Statistical Tests & Output Matrix

v2.4.0 | Report Errata

docs development docs development

Before any model is trained, the Technical SME examines the data for bias through distributional analysis. This analysis computes the distribution of each feature across protected characteristic subgroups, identifying significant differences that may indicate historical disparities the model would learn and perpetuate.

For categorical features, the chi-squared test of independence tests whether the feature’s distribution is independent of the protected characteristic. For continuous features, the Kolmogorov-Smirnov test compares cumulative distribution functions across subgroups, and the Mann-Whitney U test detects location shifts where one subgroup’s values are systematically higher or lower. The analysis should cover every feature, not only those the team suspects are problematic.

The practical output is a matrix: features on one axis, protected characteristics on the other, with each cell showing the test statistic and p-value. Features with statistically significant distributional differences (p < 0.05 after correction for multiple comparisons) are flagged for further investigation. Bonferroni correction is the simplest approach, controlling the family-wise error rate. In high-dimensional feature spaces where the number of comparisons is large, Bonferroni becomes excessively conservative and may cause genuine distributional differences to be missed; the Benjamini-Hochberg procedure (controlling the false discovery rate) is a well-accepted alternative in such cases. The chosen correction method must be documented alongside its rationale. ydata-profiling (formerly Pandas Profiling) automates this for tabular datasets, producing an HTML report with correlation matrices and distribution comparisons.

Flagged features require a documented assessment: is the distributional difference an artefact of historical disparity that the model should not learn, or is it a legitimate difference that the model should capture? A recruitment dataset where female applicants have systematically lower years of experience due to historical workforce participation patterns contains a distributional difference that, if weighted heavily by the model, would reproduce that disparity. The Technical SME’s assessment of each flagged feature is retained as a Module 4 artefact.

Key outputs

Distributional analysis output matrix
Flagged features with assessment rationale
Statistical test parameters and correction methodology