Evasion/Adversarial Examples — White-Box & Black-Box

v2.4.0 | Report Errata

docs security docs security

Adversarial example testing evaluates the model’s susceptibility to input perturbations designed to cause incorrect predictions. IBM’s Adversarial Robustness Toolbox (ART) provides the most comprehensive library. Testing should span both white-box attacks (FGSM, PGD, C&W;, DeepFool), which use knowledge of the model’s gradients to craft minimal perturbations, and black-box attacks, which work without gradient access by querying the model and using the responses to guide perturbation.

White-box testing represents a worst-case scenario: an attacker with full knowledge of the model’s architecture and parameters. Black-box testing represents a more realistic scenario for externally facing systems: an attacker who can only query the model’s API. Both should be included because a model that is robust to black-box attacks but fragile to white-box attacks is vulnerable to any attacker who gains internal access.

The test results report the attack success rate at each perturbation magnitude and compare against the robustness thresholds declared in the AISDP. Findings are documented in a structured report and fed back into the threat model and risk register. The robustness gate in the CI pipeline provides ongoing verification using a subset of the adversarial testing suite.

Key outputs

White-box testing (FGSM, PGD, C&W;, DeepFool) using ART
Black-box testing without gradient access
Attack success rates compared against declared thresholds
Module 9 and Module 5 AISDP evidence