All Four Gates Passed Requirement
No model may be deployed to production without passing all four validation gates: performance, fairness, robustness, and documentation. This requirement is enforced architecturally through the CI/CD pipeline, not merely by policy. The deployment step is gated by a policy engine (OPA/Rego or equivalent) that verifies all four gate results before allowing the deployment to proceed.
The gate architecture is layered and sequential. Performance runs first, because a model that fails basic performance is not worth evaluating for fairness or robustness. Fairness runs second, because a model that passes performance but fails fairness is rejected regardless of its robustness characteristics. Robustness runs third. The documentation gate runs last. If any gate fails, execution halts and no subsequent gates run.
A reference OPA policy is provided (deployment_compliance.rego) that encodes this requirement. The policy also verifies that the AISDP version in the deployment matches the model’s assessed version, that human approval has been recorded within the last 48 hours, and that staging tests have passed on the exact version being deployed. Deny reasons are generated for debugging failed deployments.
Key outputs
- CI/CD pipeline enforcing sequential four-gate passage
- OPA/Rego policy encoding all deployment prerequisites
- Deny-reason generation for failed deployment attempts
- Module 5 and Module 2 AISDP evidence
Human Approval for Production Promotion
Article 14’s human oversight requirement extends to the deployment decision itself. Deployment of high-risk AI systems cannot be fully automated. The pipeline pauses at the human approval step, presenting the deployment’s metadata (model version, validation gate results, staging test results) to the designated approver.
For routine releases, the Technical SME is the designated approver. For releases affecting fairness metrics, the model architecture, or the intended purpose, the AI Governance Lead approves. The approval is logged with the approver’s identity, timestamp, the evidence reviewed, and the composite version identifier being deployed. Rejection halts the pipeline and logs the rejection reason.
GitHub Actions manual approval, GitLab manual jobs, and Jenkins input steps all support this pattern. The approval log is retained as part of the AISDP evidence pack and feeds into the deployment ledger. The OPA deployment policy verifies that approval has been recorded within the last 48 hours, preventing stale approvals from being used for deployments that occurred long after the review.
Key outputs
- Human approval gate in the deployment pipeline
- Role-based approver designation (Technical SME or AI Governance Lead)
- Approval logging with identity, timestamp, evidence, and version
- Module 7 and Module 2 AISDP evidence
Canary or Shadow Deployment Phase
Progressive delivery reduces the blast radius of a deployment that causes problems despite passing staging validation. In a canary deployment, the new version receives a small percentage of production traffic (typically 1–5%) while the existing version handles the remainder. Automated analysis compares the canary’s metrics against the existing version. If the metrics diverge beyond a threshold, the canary is automatically rolled back.
If the metrics are acceptable, the canary’s traffic share is gradually increased until the new version handles 100% of traffic. Argo Rollouts and Flagger automate this process on Kubernetes, with configurable analysis steps and automatic rollback. Shadow deployment is more conservative: the new version processes production data but its outputs are not delivered to users, allowing evaluation on real data without risk to affected persons.
Shadow deployment is particularly valuable for initial deployments of high-risk systems, where the consequences of an error are severe and confidence in staging validation is limited. The canary percentage, canary duration, analysis metrics, and rollback criteria are defined by the Technical Owner in the deployment policy and documented in the AISDP. The canary or shadow analysis results are retained as Module 12 evidence.
Key outputs
- Canary or shadow deployment configuration
- Automated metric comparison and rollback triggers
- Deployment policy documenting canary percentage, duration, and criteria
- Module 2 and Module 12 AISDP evidence
Immutable Deployment Ledger Entry
Every deployment event, including canary promotions, full rollouts, and rollbacks, is recorded in the immutable deployment ledger described above. The entry captures the deployment timestamp, the composite version deployed (model version, configuration version, code version), which service versions changed, the identity of the deployer and approver, the validation evidence (gate reports, staging results, canary analysis), and the deployment outcome.
For GitOps deployments (ArgoCD, Flux), the deployment ledger is naturally produced through the Git workflow: every deployment change is a Git commit, providing an immutable audit trail. For non-GitOps deployments, the engineering team implements a custom append-only log using WORM storage (S3 Object Lock, Azure Immutable Blob Storage) or cryptographic hash chains.
Rollback events are themselves recorded as deployment ledger entries, capturing the reason for the rollback, the version restored to, and any incident reference. The deployment ledger feeds directly into AISDP Module 10 (Record-Keeping) and Module 12 (Change History). Inspectors and notified bodies may request deployment ledger entries for specific time periods during market surveillance.
Key outputs
- Immutable ledger entry per deployment event (including rollbacks)
- Composite version, approver, evidence, and outcome per entry
- GitOps audit trail or custom WORM-based implementation
- Module 10 and Module 12 AISDP evidence
Failure Handling — Severity-Based Blocking & Exception Process
The Technical SME classifies test failures by severity to determine the appropriate response. Critical failures (any test that exercises a compliance-relevant property, such as fairness, human oversight bypass, or Article 12 logging completeness) block the pipeline unconditionally. No exception process applies to critical failures; the issue must be resolved before deployment can proceed.
High-severity failures (end-to-end accuracy regression, latency threshold breach) block the pipeline unless the AI Governance Lead approves an exception with documented justification. The exception approval records the approver’s identity, the justification, the compensating controls in place, and the conditions under which the exception expires. Medium-severity failures (non-critical UI tests, documentation formatting) generate warnings and are tracked in the non-conformity register but do not block deployment.
This severity classification prevents two failure modes. Without it, every test failure blocks deployment equally, leading to compliance fatigue where teams lose urgency about genuine compliance failures because they are overwhelmed by minor issues. With excessively permissive handling, genuine compliance issues may be dismissed as low severity. The classification is documented in the AISDP and reviewed periodically to ensure it remains calibrated to the system’s risk profile.
Key outputs
- Severity classification (critical, high, medium) per test category
- Unconditional blocking for critical failures
- Exception approval process for high-severity failures
- Module 2 and Module 6 AISDP documentation