A13. Evaluation Reports Per-build records of all performance, fairness, robustness, and calibration metrics declared in the AISDP. Include confidence intervals and per-subgroup disaggregation. Any metric breaching its declared threshold blocks deployment. Computed automatically during the model evaluation stage of the CI/CD pipeline. Responsible party: CI/CD pipeline auto-generates. Technical SME reviews. Regulations addressed: Article 9(7) (testing); Article 15 (accuracy and robustness); Annex IV(2)(e) (validation and testing results). Key outputs
- Per-build multi-metric evaluation with gate decision
- Per-subgroup disaggregated results