Ensemble Methods

v2.4.0 | Report Errata

docs development docs development

Gradient-boosted decision tree ensembles (XGBoost, LightGBM, CatBoost) and random forests offer a strong balance between predictive performance and explainability. They frequently represent the best compliance trade-off for tabular data tasks in high-risk domains.

The primary explainability mechanism for ensemble methods is SHAP (SHapley Additive exPlanations). SHAP values provide theoretically grounded feature attribution at the individual prediction level, decomposing each output into the contribution of each input feature. For tree-based models, the TreeExplainer algorithm computes exact SHAP values efficiently, enabling per-decision explanations that satisfy the Article 14 human oversight requirement for most applications. Operators reviewing a system recommendation can see which features drove the ranking and how confident the system is.

On the compliance criteria, ensemble methods score adequately to strongly across all six dimensions. Documentability is adequate: the architecture can be described precisely (number of trees, depth, feature splits), though the learned parameters across hundreds or thousands of trees cannot be enumerated individually. Testability is strong, with standard evaluation methodologies well-established. Auditability is strong given SHAP-based attribution. Bias detectability is strong, as SHAP values can identify proxy variable effects at the individual prediction level. Maintainability is strong; gradient-boosted trees produce stable, predictable changes when retrained on augmented data. Determinism is strong, since outputs are fully reproducible for a given model version and input.

The fidelity of SHAP explanations for ensemble methods is generally high, as TreeExplainer computes exact (not approximate) Shapley values. The AISDP should nonetheless include fidelity validation results, confirming that perturbing the features identified as most important by SHAP produces corresponding changes in the model’s output.

Key outputs

SHAP-based attribution methodology documentation
Fidelity validation results for explanation method
Compliance criteria scoring for ensemble candidates