Deep Neural Networks — Types & Post-Hoc Explanation Methods

v2.4.0 | Report Errata

docs development docs development

Convolutional networks, recurrent networks, and transformer architectures achieve state-of-the-art performance on unstructured data tasks such as image classification, natural language processing, and speech recognition. Their selection for high-risk systems introduces specific compliance challenges that the AISDP must address candidly.

Post-hoc explanation methods exist for deep neural networks: SHAP (via KernelSHAP or DeepSHAP), LIME (Local Interpretable Model-agnostic Explanations), GradCAM (for vision models), and attention visualisation (for transformer architectures). Each method approximates the model’s reasoning rather than exposing it directly. Their fidelity to the model’s actual decision process is debated in the academic literature, and the AISDP must document the chosen method, its known limitations, and the fidelity validation performed.

For the compliance criteria, deep neural networks score weakly on documentability, since a transformer with billions of parameters cannot have its learned representations enumerated. The architecture can be described, yet the documentation gap must be addressed through behavioural characterisation. Testability is adequate; standard evaluation methodologies exist, though stochastic outputs (in generative models) may require statistical testing frameworks. Auditability varies depending on logging infrastructure; models where output depends on runtime conditions (conversation history, retrieval context) require more sophisticated logging. Bias detectability is adequate, as feature attribution methods can identify proxy effects, though with lower precision than for simpler models. Maintainability is weak to adequate, since deep networks can exhibit large behavioural shifts from small data changes. Determinism varies; some architectures are deterministic given fixed seeds, while others are inherently stochastic.

Where deep learning is chosen for a high-risk system, the AISDP must describe the compensating controls applied to address the explainability gap, including more intensive human oversight, output validation against known-good references, or constrained output spaces.

Key outputs

Post-hoc explanation method selection and justification
Compensating controls for explainability limitations
Compliance criteria scoring for neural network candidates