v2.4.0 | Report Errata
docs development docs development

A golden dataset of historical inputs with known correct outputs serves as the regression baseline. Every candidate release is evaluated against this dataset to detect behavioural regression. The golden dataset is distinct from the training or evaluation datasets; it is a curated collection of cases selected specifically for regression detection.

The golden dataset must include cases drawn from each protected characteristic subgroup. This ensures that regressions do not disproportionately affect vulnerable populations. A candidate model that maintains overall accuracy but degrades accuracy for a specific demographic group would pass a naive regression test but fail a subgroup-aware regression test. The per-subgroup structure of the golden dataset makes this visible.

The golden dataset is version-controlled and expanded over time as new edge cases are discovered through production operation, incident investigation, or user feedback. Cases that previously caused errors, near-misses, or fairness concerns should be added to the golden dataset to prevent recurrence. The regression test results, including per-subgroup breakdowns, are retained as Module 5 evidence and feed into the model validation gates.

Key outputs

  • Golden dataset with per-subgroup case coverage
  • Version-controlled dataset expanded over time with discovered edge cases
  • Per-subgroup regression analysis for each candidate release
  • Module 5 AISDP evidence
On This Page