Development
Model selection, data governance, development architectures, version control, and CI/CD pipelines with compliance gates.
151 articles in this section
1.
Model Selection (S.3)
Model selection is the first major technical decision in the AISDP(
2.
Full-Spectrum Evaluation
Heuristic & Rule-Based Systems(
3.
Heuristic & Rule-Based Systems
Many organisations operate data-driven decisioning systems that predate the machine learning era. These include expert…
4.
Statistical & Econometric Models
Logistic regression, linear regression, generalised linear models, and survival models occupy a middle ground between…
5.
Ensemble Methods
Gradient-boosted decision tree ensembles (XGBoost, LightGBM, CatBoost) and random forests offer a strong balance…
6.
Deep Neural Networks — Types & Post-Hoc Explanation Methods
Convolutional networks, recurrent networks, and transformer architectures achieve state-of-the-art performance on…
7.
Deep Neural Networks — Candid Explainability Limitations Assessment
Where a deep neural network is selected for a high-risk system, the AISDP must include a candid assessment of the…
8.
Foundation Models & LLMs — Provenance Documentation
Large language models and foundation models used as components within high-risk systems introduce documentation…
9.
Foundation Models & LLMs — Fine-Tuning Records
Organisations that fine-tune a foundation model for use in a high-risk system must document the fine-tuning process…
10.
Foundation Models & LLMs — Stochastic Output Handling
The stochastic nature of LLM outputs, where the same input may produce different outputs across invocations, requires…
11.
Foundation Models & LLMs — Article 53 GPAI Obligations
When a high-risk system incorporates a general-purpose AI model, the downstream provider bears full responsibility for…
12.
Hybrid Architectures — Component Documentation & Conflict Resolution
Many production systems combine multiple decisioning approaches: a rules engine for hard constraints, a statistical…
13.
Model Origin Risk
Open-Source Models — Training Data Provenance(
14.
Open-Source Models — Training Data Provenance
Open-source models from repositories such as Hugging Face, GitHub, or academic publications offer accessibility and…
15.
Open-Source Models — Licensing Compatibility
The licensing terms attached to open-source models carry compliance implications that extend beyond intellectual…
16.
Open-Source Models — Development Governance Gaps
Open-source models are frequently developed without the governance structures that the EU AI Act expects for high-risk…
17.
Open-Source Models — Bias Testing & Adversarial Evaluation History
For open-source models incorporated into high-risk systems, the AISDP must document whether the model has undergone…
18.
Open-Source Models — Residual Non-Conformity Risk
After completing provenance assessment, licence review, governance gap analysis, and bias and adversarial evaluation, a…
19.
Commercial APIs — Contractual Terms & SLAs
Models licensed from commercial API providers present a different risk profile from open-source components. The…
20.
Commercial APIs — Provider Data Handling & Geographic Considerations
Many commercial AI API providers collect data from their customers' usage. This may include the inputs submitted, the…
21.
Copyright & IP Exposure
Training Data Copyright Assessment The training data used to develop AI models, particularly large language models and…
22.
Fine-Tuning Provider Boundary (Art. 25)
Substantial Modification Assessment Fine-tuning a GPAI model for use in a high-risk system raises the question of…
23.
Compliance Criteria Scoring
Six compliance criteria are scored against each candidate model architecture during selection. Together they form the…
24.
Documentability
Documentability is the first criterion. The question is: can the model's architecture, hyperparameters, and decision…
25.
Testability
Testability asks whether the model architecture supports the testing required by Article 15. Can accuracy, robustness,…
26.
Auditability
Auditability asks whether the model produces outputs that can be logged, traced, and attributed in accordance with…
27.
Bias Detectability
Bias detectability asks whether fairness metrics can be computed at the subgroup level, whether the model can be…
28.
Maintainability
Maintainability asks whether the model can be retrained, fine-tuned, or recalibrated in response to post-market…
29.
Determinism
Determinism asks whether the model produces the same output for the same input consistently across executions. This…
30.
Model Selection Artefacts
Five artefacts are produced during the model selection(
31.
Data Governance (S.4)
Data governance addresses the EU AI Act's Article 10 requirements for training, validation, and testing data. This…
32.
Dataset Documentation
Source & Acquisition Method(
33.
Source & Acquisition Method
Every dataset used in the system's lifecycle, whether for training, validation, testing, calibration, or fine-tuning,…
34.
Record Count, Schema & Version Identifier
The composition section of the dataset documentation captures the structural characteristics that enable a reviewer to…
35.
Temporal & Geographic Scope
The temporal and geographic scope of a dataset directly affects its suitability for training a high-risk AI system…
36.
Demographic Composition
The demographic composition of a dataset determines whether the model will perform equitably across the populations on…
37.
Known Limitations
Every dataset has limitations. The compliance value lies in documenting them candidly rather than concealing them…
38.
Completeness Assessment
Completeness Assessment - EU AI Act documentation.
39.
Completeness Assessment
Population Representativeness Population representativeness asks whether the training, validation, and testing datasets…
40.
Pre-Training Bias Assessment
Distributional Analysis — Statistical Tests & Output…
41.
Distributional Analysis — Statistical Tests & Output Matrix
Before any model is trained, the Technical SME examines the data for bias through distributional analysis. This…
42.
Label Bias Analysis — Inter-Rater Reliability & Relabelling
Label bias arises when the outcome labels used as ground truth for training reflect the biases of the humans or…
43.
Label Bias Analysis — Ground Truth Contamination Assessment
Ground truth contamination occurs when the labels used for training are themselves the product of a biased process that…
44.
Proxy Variable Detection — Correlation Methods & Thresholds
Proxy variables are features that are not themselves protected characteristics but correlate strongly enough with…
45.
Proxy Variable Detection — Justification Review for Retained Proxies
When a feature is flagged as a potential proxy variable, the Technical SME conducts a justification review to determine…
46.
Intersectional Pre-Training Analysis — Subgroups & Cell Size Thresholds
Standard bias analysis examines each protected characteristic in isolation. Intersectional analysis examines…
47.
Post-Training Bias Evaluation
Post-Training Bias Evaluation - EU AI Act documentation.
48.
Post-Training Bias Evaluation
Selection Rate Ratio (Four-Fifths Rule)(
49.
Selection Rate Ratio (Four-Fifths Rule)
For binary classification systems, the selection rate ratio is the simplest and most widely understood fairness metric.…
50.
Equalised Odds — TPR & FPR Parity
Equalised odds requires that the model's true positive rate (TPR) and false positive rate (FPR) are consistent across…
51.
Predictive Parity
Predictive parity asks whether positive predictions are equally accurate across subgroups. If the model's positive…
52.
Calibration Within Groups — Reliability Diagrams
Calibration within groups tests whether the model's confidence scores carry consistent meaning across protected…
53.
Counterfactual Fairness Testing
Counterfactual fairness is the most direct test of whether a model uses protected characteristics in its decisions. For…
54.
Fairness Concept Priority Decision & Documented Rationale
The five post-training fairness metrics (selection rate ratio, equalised odds, predictive parity, calibration within…
55.
Fairness Tooling (Fairlearn, AI Fairness 360)
The fairness evaluation suite integrates all five post-training metrics into a single evaluation report that runs as…
56.
Bias Mitigation
Bias Mitigation - EU AI Act documentation.
57.
Bias Mitigation
Pre-Processing Techniques (Oversampling, Undersampling, Reweighting, Synthetic…
58.
Pre-Processing Techniques (Oversampling, Undersampling, Reweighting, Synthetic Data)
Pre-processing mitigations modify the training data before the model encounters it. They are the most accessible class…
59.
In-Processing Techniques (Fairness Constraints, Adversarial Debiasing, Invariant Representations)
In-processing mitigations modify the model's training procedure to incorporate fairness objectives. They are more…
60.
Post-Processing Techniques (Threshold Calibration, Score Adjustment, Reject Option)
Post-processing mitigations modify the model's outputs after inference, avoiding model retraining. They are the…
61.
Compensating Controls — Mandatory Human Review & Enhanced Monitoring
When bias mitigation(
62.
Compensating Controls — Deployment Restrictions & Residual Bias Acceptance
Where neither mitigation techniques nor mandatory human review fully address the identified bias, the AISDP documents…
63.
Data Lineage & Version Control
Data Lineage & Version Control - EU AI Act documentation.
64.
Data Lineage & Version Control
Transformation Documentation (Pre-Step / Post-Step) Data lineage requires documenting every data engineering step with…
65.
Special Category Data (Art. 10(5))
Legal Basis & Purpose Limitation Article 10(5) permits the processing of special category personal data (race,…
66.
RAG-Specific Governance
Knowledge Base Completeness & Currency(
67.
Knowledge Base Completeness & Currency
In a RAG architecture, the knowledge base functions as the information source that directly shapes the system's…
68.
Embedding Bias & Representational Risk
Embedding models encode semantic associations from their training data into the geometry of the vector space. Research…
69.
Multilingual Performance
Most widely available embedding models perform best on English-language text. For high-risk systems deployed across…
70.
GDPR Status of Stored Embeddings
Dense vector embeddings derived from text containing personal data may themselves constitute personal data if the…
71.
Embedding Version Control
Embedding models produce vector representations specific to the model version. When the embedding model is updated, the…
72.
Data Governance Artefacts
Data Governance Artefacts - EU AI Act documentation.
73.
Data Governance Artefacts
Dataset Documentation Cards(
74.
Dataset Documentation Cards
Dataset Documentation Cards are the consolidated artefacts that present the information in a structured, reviewable…
75.
Distributional Analysis Reports
Distributional Analysis Reports consolidate the outputs of Article 70 (distributional analysis), Article…
76.
Bias Evaluation Reports (Pre-Training & Post-Training)
Bias Evaluation(
77.
Mitigation Effectiveness Assessments
Each bias mitigation(
78.
Data Lineage Records
Data Lineage(
79.
Feature Registry with Proxy Variable Assessments
The Feature Registry(
80.
DPIA (Where Required)
A Data Protection Impact Assessment is required under GDPR Article 35 whenever processing is likely to result in a high…
81.
Development Architectures (S.5)
Development architectures translate the system's compliance requirements into a concrete technical design. This section…
82.
Statement of Business Intent
System Purpose & Constraints Before any architectural work begins, the Business Owner must articulate a precise…
83.
Eight-Layer Reference Architecture
The eight-layer reference architecture provides a structured approach to designing high-risk AI systems with compliance…
84.
Layer 1: Data Ingestion
Schema Contracts & Quality Specification The data ingestion layer is the system's first contact with external data, and…
85.
Layer 2: Feature Engineering
Training-Serving Consistency — Feature Stores & Single Computation Spec Training-serving skew is a pernicious failure…
86.
Layer 3: Model Inference
Model Version Pinning & Cryptographic Hash Verification The inference layer must serve a specific, immutable model…
87.
Layer 4: Post-Processing
Business Rule Application — Documentation & Override Logging Many high-risk AI systems apply business rules after model…
88.
Layer 5: Explainability
Explanation Methods (SHAP, LIME, GradCAM, Attention) The explainability layer generates human-readable explanations of…
89.
Layer 6: Human Oversight Interface
Mandatory Review Workflow — Auto-Acceptance Prevention The human oversight interface must enforce a review step before…
90.
Layer 7: Logging & Audit
Immutable Logging — Append-Only & Cryptographic Hash Chains Article 12 requires automatic recording of events during…
91.
Layer 8: Monitoring
Intent Alignment Dashboards — Real-Time vs AISDP Thresholds The monitoring layer provides dashboards that show the…
92.
Infrastructure Design
Cloud Deployment (Multi-AZ, Multi-Region) For cloud-hosted high-risk AI systems, the AISDP must specify the cloud…
93.
Architecture Artefacts
Statement of Business Intent (Signed)(
94.
Statement of Business Intent (Signed)
The Statement of Business Intent is the foundational artefact for the system's compliance documentation. It captures…
95.
System Architecture Document (C4 Diagrams)
The System Architecture Document provides a layered description of the system's structure using the C4 model. It…
96.
Data Flow & Deployment Diagrams
The Data Flow Diagram traces the path of data through the system from ingestion to output, showing raw input data…
97.
Dependency Maps
Dependency maps show how the AI system relates to its external dependencies: the data sources it consumes, the APIs it…
98.
Per-Layer Control Specifications
The Per-Layer Control Specifications document consolidates the compensating controls implemented at each of the eight…
99.
Human Oversight Interface Specification
The Human Oversight Interface Specification is the design document for the Layer 6 interface through which operators…
100.
Version Control (S.6)
Version control for high-risk AI systems extends well beyond conventional source code management. The composite…
101.
Composite Versioning Scheme
Version Components (Code SHA, Data, Model, Config, Prompt) Traceability(
102.
Code Version Control
Git Repository Management & Branch Protection Compliance-grade version control requires that every version is immutable…
103.
Data Version Control
Data Versioning Tooling (DVC, Delta Lake, LakeFS) Data versioning ensures that, for any model version, the organisation…
104.
Model Registry
Tooling (MLflow, W&B;, SageMaker, Vertex AI)(
105.
Tooling (MLflow, W&B, SageMaker, Vertex AI)
The model registry(
106.
Immutable Versioning — Unique Non-Reusable IDs
Each registered model version must be assigned a unique, non-reusable identifier. This requirement ensures that a model…
107.
Metadata Attachment (Dataset, Code, Pipeline, Hash, Hyperparameters, Metrics, Approval)
Each model version in the registry must carry structured metadata that links it to every artefact in its provenance…
108.
Lineage Tracking (Data → Code → Pipeline → Model)
Lineage tracking links each model version to the specific data version, code version, and pipeline execution that…
109.
Stage Management (Experimental, Staging, Production, Archived)
Models progress through defined stages in the registry: experimental (initial registration after training), staging…
110.
Access Control — CI/CD Promotes; No Manual Promotion
For high-risk systems, only the CI/CD pipeline(
111.
Long-Term Retrieval (Ten-Year Archive)
Archived models must be retrievable for the full ten-year retention period mandated by Article 18. This is not merely a…
112.
Manual Alternative (Directories, Spreadsheet, Signed Approval)
For organisations that cannot deploy a model registry(
113.
Configuration & Prompt Versioning
Decision Thresholds & Feature Flags Decision thresholds and feature flags are configuration artefacts that materially…
114.
Substantial Modification Detection
Modification Threshold Framework Article 3(23) defines substantial modification as a change "not foreseen or planned in…
115.
Cascading Change Management
Microservice Dependency Mapping High-risk AI systems built on microservice architectures require a current dependency…
116.
Traceability
Technical Traceability (Model, Code, Data, Infrastructure, Input — All Hash-Verified) Technical traceability answers…
117.
Version Control Artefacts
Version-Controlled Code, Data, Model, Config This artefact encompasses the complete version-controlled estate of the AI…
118.
CI/CD Pipelines (S.7)
The CI/CD pipeline for a high-risk AI system enforces compliance at every stage, from static…
119.
Static Analysis
Linting & Type Checking(
120.
Linting & Type Checking
Standard linting and type checking form the foundation of the static analysis(
121.
AI-Specific Custom Rules (Semgrep) — Demographic Feature Flagging
Standard linting tools do not catch AI-specific compliance risks. Custom static…
122.
AI-Specific Custom Rules — Hardcoded Threshold Detection
Hardcoded thresholds, such as if score 0.65 embedded directly in code, undermine version control and change tracking.…
123.
AI-Specific Custom Rules — Missing Logging Detection (Art. 12)
Article 12 requires automatic recording of events during the system's operation. Any inference code path that can…
124.
AI-Specific Custom Rules — Model Registry Bypass Detection
Direct model file loading, such as torch.load('model.pt') or joblib.load('model.pkl'), bypasses the model…
125.
Dependency Scanning (Snyk, Dependabot, pip-audit, OWASP)
Every third-party dependency (Python packages, npm modules, system libraries) must be scanned against known…
126.
Licence Compliance Scanning (FOSSA, Black Duck, pip-licenses)
Automated licence compliance scanning prevents the organisation from inadvertently using libraries with licence terms…
127.
Secret Detection (Pre-Commit Hooks & CI Steps)
Credentials, API keys, database connection strings, and personal data must never appear in the version…
128.
Unit Testing
Data Pipeline Tests (Normal, Boundary, Pathological, Schema, Distribution,…
129.
Data Pipeline Tests (Normal, Boundary, Pathological, Schema, Distribution, Property-Based)
Each data transformation step in the pipeline requires unit tests that go beyond verifying correct output for a handful…
130.
Feature Engineering Tests (Registry Match, Determinism, Imputation, Range)
Each feature computation must have unit tests verifying four properties. First, the feature's output must match the…
131.
Model Inference Tests (Registry Load, Format, Determinism, Latency, Degradation)
The model serving component requires unit tests confirming five properties. The model must load correctly from the…
132.
Post-Processing Tests (Thresholds, Calibration, Business Rules, Edge Cases)
Threshold application, score calibration, business rule application, and output formatting each require unit tests…
133.
Explainability Tests (Coverage, Attribution Sums, Fidelity, Format)
The explanation generation component requires unit tests verifying four properties. Coverage tests confirm that…
134.
Human Oversight Interface Tests (Bypass Prevention, Rationale, Confidence, Countermeasures)
The human oversight interface is a compliance-critical component that requires dedicated unit tests. The mandatory…
135.
Integration & End-to-End Testing
Contract Tests (Service-to-Service)(
136.
Contract Tests (Service-to-Service)
Contract tests validate that each service's outputs conform to the expectations of its consumers. As described above,…
137.
End-to-End Inference Path Tests (Known Input → Expected Output + Logs)
End-to-end inference path tests exercise the complete chain from data ingestion through feature engineering, model…
138.
Regression Tests — Golden Dataset with Per-Subgroup Cases
A golden dataset of historical inputs with known correct outputs serves as the regression baseline. Every candidate…
139.
Human Oversight Interface Testing (Selenium/Playwright/Cypress Automation)
Human oversight interface testing is frequently neglected but compliance-critical. Automated UI testing tools…
140.
Load Testing (Locust, k6) — Latency & Throughput Under Load
The AISDP declares the system's performance characteristics, including latency at the p50, p95, and p99 percentiles and…
141.
Chaos & Fault Injection Testing (Gremlin, Litmus) — Graceful Degradation
Chaos and fault injection tests simulate failures at each layer of the system (data source unavailable, model serving…
142.
Model Validation Gates
Performance Gate (AUC-ROC, F1, Precision, Recall, Brier, Calibration) The performance gate is the first of the four…
143.
Automated Documentation Generation
Model Cards (Per Build) Model cards are auto-generated from the evaluation metrics stored in the experiment tracker and…
144.
Compliance-Gated Deployment
All Four Gates Passed Requirement No model may be deployed to production without passing all four validation gates:…
145.
CI/CD Artefacts
Automated Test Reports(
146.
Automated Test Reports
This artefact comprises the complete set of auto-generated test reports produced by the CI pipeline across the system's…
147.
Model Cards
This artefact comprises the collection of model cards generated across the system's lifecycle. Each model card…
148.
SBOMs
This artefact comprises the collection of SBOMs generated across the system's lifecycle. Each…
149.
Security Scan Results & Remediation Records
This artefact comprises the results from dependency scanning, licence compliance scanning, secret detection, and…
150.
Deployment Ledger Entries
This artefact is the materialised deployment ledger described above, viewed as a CI/CD…
151.
Exception Approval Records
This artefact comprises the collection of exception approvals granted through the severity-based failure handling…