v2.4.0 | Report Errata
docs development docs development

Modification Threshold Framework

Article 3(23) defines substantial modification as a change “not foreseen or planned in the initial conformity assessment” that affects compliance or the intended purpose. The definition is qualitative; the regulation does not specify numeric thresholds. The organisation must therefore define its own thresholds, document the rationale for each, and encode them in automated gates.

Several starting points are recommended. A change in AUC-ROC exceeding ±0.03, any subgroup fairness metric breaching its established threshold, a change in the model’s top-five feature importance ranking, the introduction or removal of input features, a change in the intended purpose or deployment context, or a modification to the human oversight architecture would all be candidates for substantial modification assessment.

These thresholds are calibrated by the Technical SME to the system’s specific risk profile. A recruitment screening system processing thousands of candidates per month requires tighter thresholds than an internal document classification system. The calibration rationale is documented in the risk register and reviewed annually. The thresholds are encoded as assertions in the CI/CD pipeline, and tools such as Evidently AI or NannyML can generate profile-comparison reports with configurable pass/fail conditions.

Key outputs

  • Defined quantitative thresholds per measurable dimension of change
  • Calibration rationale documented in the risk register
  • CI/CD pipeline encoding of thresholds as automated assertions
  • Module 6 and Module 12 AISDP documentation

Cumulative Baseline Tracking

A series of individually sub-threshold changes that collectively alter the system’s behaviour significantly may constitute a cumulative substantial modification. The system’s performance may degrade by 0.005 AUC-ROC with each minor update; after ten updates, the cumulative drift of 0.05 exceeds the threshold, even though no individual change triggered the flag.

The mitigation is to maintain a baseline snapshot captured at the time of the last conformity assessment. This baseline records the evaluation metrics, the output distribution profile, the fairness metrics, and the feature importance rankings from the assessed version. Every subsequent version is compared against both the preceding version (to detect individual-change threshold breaches) and the baseline (to detect cumulative drift).

Two parallel metric tracks implement this. The version-to-version comparison runs in the CI pipeline for every candidate release. The baseline comparison runs on a scheduled basis, quarterly being aligned with the risk review cadence, and additionally whenever a new version is deployed. Evidently AI supports this through its time-series monitoring capability, comparing each subsequent dataset or model version against the reference and alerting when cumulative drift exceeds a configured threshold.

Key outputs

  • Baseline snapshot from the last conformity assessment (metrics, distributions, feature importance)
  • Version-to-version comparison in the CI pipeline
  • Baseline comparison on a scheduled and per-deployment basis
  • Module 6 and Module 12 AISDP evidence

Decision Flow for Borderline Cases

When the automated quality gates flag a change that approaches or exceeds a substantial modification threshold, the determination follows a defined decision flow. The Technical SME conducts an initial assessment, documenting which metrics have changed, by how much, and why. The AI Governance Lead then determines whether the change constitutes a substantial modification under Article 3(23).

For borderline cases, the Legal and Regulatory Advisor provides input, particularly where the change involves the intended purpose or deployment context. The determination is a documented decision with three possible outcomes: the change is a substantial modification (triggering a new conformity assessment), the change is within acceptable bounds (documented with supporting evidence in Module 12), or the cumulative baseline comparison has been triggered (requiring the full substantial modification assessment even though the individual change was sub-threshold).

If a substantial modification is confirmed, the consequence is significant: a new conformity assessment is required before the modified system can be placed on the market or put into service. The system re-enters Phase 5 of the delivery process. Organisations should design their change management processes to anticipate this possibility, assessing changes against the thresholds before implementation rather than after.

Key outputs

  • Documented decision flow with role assignments
  • Initial assessment template for the Technical SME
  • Legal and Regulatory Advisor consultation for borderline cases
  • Determination records with rationale and evidence

Trigger: Re-Assessment Under the Act

When a change is determined to be a substantial modification, the regulatory consequence is that a new conformity assessment must be completed before the modified system can be placed on the market or put into service. This is the operational trigger that connects the version control and change management framework to the conformity assessment process.

The re-assessment follows the same conformity assessment process as the initial assessment, though it may be scoped to focus on the aspects of the system affected by the modification. The assessor evaluates the modified system against the requirements of the EU AI Act, taking into account the nature of the modification and its impact on the system’s compliance posture.

The CI/CD pipeline should provide early warning when a change-in-progress is trending toward the substantial modification threshold. This enables the team to adjust the change, seek governance approval in advance, or prepare for a new conformity assessment. The substantial modification determination record, including the assessment of whether a new conformity assessment was required and the outcome of any such assessment, is retained in Module 12 as part of the system’s change history.

Key outputs

  • Re-assessment trigger documentation
  • Scoping guidance for modification-focused conformity assessment
  • Early warning mechanism in the CI/CD pipeline
  • Module 6 and Module 12 AISDP evidence
On This Page