Adversarial Testing by Modality

v2.4.0 | Report Errata

docs security docs security

Adversarial testing methodologies vary by the model’s input modality. For tabular models, the testing protocol perturbs input features at realistic noise levels (±5% on continuous features, random category flips at 1% rate) and records the prediction change rate. This approach reflects real-world attack vectors: an applicant slightly modifying their reported income or a data entry error altering a critical field.

For image models, ART generates adversarial images at varying perturbation magnitudes using FGSM, PGD, and C&W; methods. The perturbation budget (measured in L2 or L∞ norms) should reflect the threat model’s assessment of realistic attack capabilities. For text models, TextAttack provides character-level (typos, homoglyph substitutions), word-level (synonym replacement), and sentence-level (paraphrase) perturbation attacks.

The testing results for each modality are reported with the attack methods used, the perturbation magnitudes tested, the success rates at each magnitude, and the comparison against the declared robustness thresholds. The Technical SME selects the attack methods most relevant to the system’s modality and deployment context, documenting the selection rationale in Module 9.

Key outputs

Modality-specific adversarial testing (tabular, image, text)
Perturbation budgets reflecting realistic attack capabilities
Per-modality success rate reporting against declared thresholds
Module 9 and Module 5 AISDP evidence