v2.4.0 | Report Errata
docs artefact-taxonomy docs artefact-taxonomy

A13. Evaluation Reports Per-build records of all performance, fairness, robustness, and calibration metrics declared in the AISDP. Include confidence intervals and per-subgroup disaggregation. Any metric breaching its declared threshold blocks deployment. Computed automatically during the model evaluation stage of the CI/CD pipeline. Responsible party: CI/CD pipeline auto-generates. Technical SME reviews. Regulations addressed: Article 9(7) (testing); Article 15 (accuracy and robustness); Annex IV(2)(e) (validation and testing results). Key outputs

  • Per-build multi-metric evaluation with gate decision
  • Per-subgroup disaggregated results
On This Page