Explanation Methods (SHAP, LIME, GradCAM, Attention)
The explainability layer generates human-readable explanations of individual predictions, supporting the Article 14 requirement that operators be able to “correctly interpret the system’s output.” The choice of explanation method depends on the model architecture and the computational constraints of the production environment.
SHAP TreeExplainer is suited to gradient-boosted tree models (XGBoost, LightGBM) and runs in near-linear time, making it viable for per-prediction explanation at high throughput. KernelSHAP and DeepSHAP handle neural networks but are significantly more expensive, often requiring hundreds or thousands of model evaluations per explanation. LIME (Local Interpretable Model-agnostic Explanations) offers a model-agnostic alternative by fitting a local surrogate model to each prediction. GradCAM and attention-based methods are specific to convolutional and transformer architectures respectively.
Fiddler AI provides production-grade explainability infrastructure, hooking into the serving pipeline to compute feature attributions per prediction, store them alongside the inference log, and provide monitoring dashboards. The AISDP must document the explanation method selected, its computational cost, the latency impact on the inference pipeline, and the validation performed to confirm that the explanations are faithful to the model’s actual reasoning.
Key outputs
- Selected explanation method(s) with justification for the choice
- Computational cost and latency impact assessment
- Integration with the inference pipeline and logging infrastructure
- Module 3 and Module 7 documentation
Fidelity Validation — Attribution vs Model Sensitivity
An explanation that attributes a decision to Feature A when the model actually relied on Feature B is worse than no explanation at all, because it misleads the human overseer. Fidelity validation tests whether the explainability layer’s attributions accurately reflect the model’s actual behaviour.
The Technical SME validates explanations by comparing the explanation’s feature attributions against the model’s sensitivity to feature perturbations. If the explanation claims that Feature A was the dominant driver, perturbing Feature A should produce a larger change in the model’s output than perturbing other features. Systematic disagreement between the attributions and the perturbation analysis indicates that the explanation method is unreliable for the given model.
Fidelity validation should be performed during development (as part of the initial model validation), at deployment (to confirm that the production explanation pipeline matches the development results), and periodically during operation (to catch cases where explanation fidelity degrades due to input distribution shifts). The validation results are documented in AISDP Module 5 and feed into the human oversight design documented in Module 7.
Key outputs
- Fidelity validation test suite (attribution vs perturbation analysis)
- Validation results at development, deployment, and periodic intervals
- Documented limitations where fidelity is imperfect
- Module 5 and Module 7 AISDP evidence
Audience-Appropriate Abstraction (Operators vs Affected Persons)
Explanations must be tailored to their audience. Technical operators require precise feature contributions and confidence indicators to evaluate the system’s output and decide whether to accept or override it. Affected persons require plain-language explanations that avoid jargon and focus on the factors most relevant to their individual situation.
The Technical SME designs at least two explanation formats: one for operators (detailed, quantitative, showing feature attributions and confidence scores) and one for affected persons (narrative, accessible, focusing on the key reasons for the outcome). The AISDP must document both formats, the rationale for the abstraction choices, and the validation performed to confirm comprehensibility.
Comprehensibility validation involves testing the explanations with representative users from each audience. Can operators use the detailed explanation to form an independent judgement? Can affected persons understand the plain-language explanation well enough to identify potential errors or challenge the outcome? The results of this validation feed into AISDP Module 7 (human oversight design) and Module 8 (transparency and user information), and they also inform the ongoing explanation consistency monitoring described above.
Key outputs
- Operator-facing explanation format with feature attributions and confidence indicators
- Affected-person-facing explanation format in plain language
- Comprehensibility validation results for each audience
- Module 7 and Module 8 documentation
Explanation Consistency Monitoring
The explanations provided by the explainability layer should remain consistent over time for similar inputs. If the dominant features in explanations shift without a corresponding model update, this may indicate that the model’s internal behaviour is changing in response to input distribution shifts, or that the explanation method itself is unstable.
Fiddler AI and similar monitoring tools track explanation patterns over time, identifying changes in the ranking or magnitude of feature attributions across the prediction population. An alert on explanation pattern change serves as a valuable early warning signal for outcome drift, sometimes surfacing issues before they are detected by aggregate performance or fairness metrics.
Explanation consistency monitoring feeds into the post-market monitoring framework documented in AISDP Module 12. It complements the feature distribution monitoring at Layer 2 and the threshold stability monitoring at Layer 4, providing a multi-layered view of the system’s behavioural stability. When an explanation consistency alert fires, the investigation should determine whether the root cause is input drift, model drift, or an artefact of the explanation method.
Key outputs
- Explanation pattern tracking configuration
- Alerting thresholds for attribution shifts
- Integration with post-market monitoring dashboards
- Investigation procedures for explanation consistency alerts
Comprehensibility Validation Records
Comprehensibility validation confirms that the explanations generated by the system are understandable to their intended audiences. This is distinct from fidelity validation, which tests whether explanations are accurate. An explanation can be faithful to the model’s reasoning yet incomprehensible to the person reading it.
The validation typically involves presenting explanations to representative users from each target audience and assessing whether they can correctly interpret the explanation’s meaning. For operators, the test is whether the explanation supports independent judgement: can the operator identify which factors drove the outcome and assess whether those factors are reasonable? For affected persons, the test is whether the explanation enables them to understand the key reasons for the decision and identify grounds for challenge.
The AISDP must retain records of comprehensibility validation, including the methodology used, the participant profiles, the scenarios tested, the results, and any design changes made in response to the findings. These records form part of Module 5 (validation evidence) and Module 7 (human oversight design), demonstrating that the organisation has taken active steps to ensure its explanations are fit for purpose.
Key outputs
- Comprehensibility validation methodology and participant profiles
- Test scenarios and results
- Design changes made in response to findings
- Module 5 and Module 7 AISDP evidence records