Development

1.

Model Selection (S.3) Model selection is the first major technical decision in the AISDP(

2.

Full-Spectrum Evaluation Heuristic & Rule-Based Systems(

3.

Heuristic & Rule-Based Systems Many organisations operate data-driven decisioning systems that predate the machine learning era. These include expert…

4.

Statistical & Econometric Models Logistic regression, linear regression, generalised linear models, and survival models occupy a middle ground between…

5.

Ensemble Methods Gradient-boosted decision tree ensembles (XGBoost, LightGBM, CatBoost) and random forests offer a strong balance…

6.

Deep Neural Networks — Types & Post-Hoc Explanation Methods Convolutional networks, recurrent networks, and transformer architectures achieve state-of-the-art performance on…

7.

Deep Neural Networks — Candid Explainability Limitations Assessment Where a deep neural network is selected for a high-risk system, the AISDP must include a candid assessment of the…

8.

Foundation Models & LLMs — Provenance Documentation Large language models and foundation models used as components within high-risk systems introduce documentation…

9.

Foundation Models & LLMs — Fine-Tuning Records Organisations that fine-tune a foundation model for use in a high-risk system must document the fine-tuning process…

10.

Foundation Models & LLMs — Stochastic Output Handling The stochastic nature of LLM outputs, where the same input may produce different outputs across invocations, requires…

11.

Foundation Models & LLMs — Article 53 GPAI Obligations When a high-risk system incorporates a general-purpose AI model, the downstream provider bears full responsibility for…

12.

Hybrid Architectures — Component Documentation & Conflict Resolution Many production systems combine multiple decisioning approaches: a rules engine for hard constraints, a statistical…

13.

Model Origin Risk Open-Source Models — Training Data Provenance(

14.

Open-Source Models — Training Data Provenance Open-source models from repositories such as Hugging Face, GitHub, or academic publications offer accessibility and…

15.

Open-Source Models — Licensing Compatibility The licensing terms attached to open-source models carry compliance implications that extend beyond intellectual…

16.

Open-Source Models — Development Governance Gaps Open-source models are frequently developed without the governance structures that the EU AI Act expects for high-risk…

17.

Open-Source Models — Bias Testing & Adversarial Evaluation History For open-source models incorporated into high-risk systems, the AISDP must document whether the model has undergone…

18.

Open-Source Models — Residual Non-Conformity Risk After completing provenance assessment, licence review, governance gap analysis, and bias and adversarial evaluation, a…

19.

Commercial APIs — Contractual Terms & SLAs Models licensed from commercial API providers present a different risk profile from open-source components. The…

20.

Commercial APIs — Provider Data Handling & Geographic Considerations Many commercial AI API providers collect data from their customers' usage. This may include the inputs submitted, the…

21.

Copyright & IP Exposure Training Data Copyright Assessment The training data used to develop AI models, particularly large language models and…

22.

Fine-Tuning Provider Boundary (Art. 25) Substantial Modification Assessment Fine-tuning a GPAI model for use in a high-risk system raises the question of…

23.

Compliance Criteria Scoring Six compliance criteria are scored against each candidate model architecture during selection. Together they form the…

24.

Documentability Documentability is the first criterion. The question is: can the model's architecture, hyperparameters, and decision…

25.

Testability Testability asks whether the model architecture supports the testing required by Article 15. Can accuracy, robustness,…

26.

Auditability Auditability asks whether the model produces outputs that can be logged, traced, and attributed in accordance with…

27.

Bias Detectability Bias detectability asks whether fairness metrics can be computed at the subgroup level, whether the model can be…

28.

Maintainability Maintainability asks whether the model can be retrained, fine-tuned, or recalibrated in response to post-market…

29.

Determinism Determinism asks whether the model produces the same output for the same input consistently across executions. This…

30.

Model Selection Artefacts Five artefacts are produced during the model selection(

31.

Data Governance (S.4) Data governance addresses the EU AI Act's Article 10 requirements for training, validation, and testing data. This…

32.

Dataset Documentation Source & Acquisition Method(

33.

Source & Acquisition Method Every dataset used in the system's lifecycle, whether for training, validation, testing, calibration, or fine-tuning,…

34.

Record Count, Schema & Version Identifier The composition section of the dataset documentation captures the structural characteristics that enable a reviewer to…

35.

Temporal & Geographic Scope The temporal and geographic scope of a dataset directly affects its suitability for training a high-risk AI system…

36.

Demographic Composition The demographic composition of a dataset determines whether the model will perform equitably across the populations on…

37.

Known Limitations Every dataset has limitations. The compliance value lies in documenting them candidly rather than concealing them…

38.

Completeness Assessment Completeness Assessment - EU AI Act documentation.

39.

Completeness Assessment Population Representativeness Population representativeness asks whether the training, validation, and testing datasets…

40.

Pre-Training Bias Assessment Distributional Analysis — Statistical Tests & Output…

41.

Distributional Analysis — Statistical Tests & Output Matrix Before any model is trained, the Technical SME examines the data for bias through distributional analysis. This…

42.

Label Bias Analysis — Inter-Rater Reliability & Relabelling Label bias arises when the outcome labels used as ground truth for training reflect the biases of the humans or…

43.

Label Bias Analysis — Ground Truth Contamination Assessment Ground truth contamination occurs when the labels used for training are themselves the product of a biased process that…

44.

Proxy Variable Detection — Correlation Methods & Thresholds Proxy variables are features that are not themselves protected characteristics but correlate strongly enough with…

45.

Proxy Variable Detection — Justification Review for Retained Proxies When a feature is flagged as a potential proxy variable, the Technical SME conducts a justification review to determine…

46.

Intersectional Pre-Training Analysis — Subgroups & Cell Size Thresholds Standard bias analysis examines each protected characteristic in isolation. Intersectional analysis examines…

47.

Post-Training Bias Evaluation Post-Training Bias Evaluation - EU AI Act documentation.

48.

Post-Training Bias Evaluation Selection Rate Ratio (Four-Fifths Rule)(

49.

Selection Rate Ratio (Four-Fifths Rule) For binary classification systems, the selection rate ratio is the simplest and most widely understood fairness metric.…

50.

Equalised Odds — TPR & FPR Parity Equalised odds requires that the model's true positive rate (TPR) and false positive rate (FPR) are consistent across…

51.

Predictive Parity Predictive parity asks whether positive predictions are equally accurate across subgroups. If the model's positive…

52.

Calibration Within Groups — Reliability Diagrams Calibration within groups tests whether the model's confidence scores carry consistent meaning across protected…

53.

Counterfactual Fairness Testing Counterfactual fairness is the most direct test of whether a model uses protected characteristics in its decisions. For…

54.

Fairness Concept Priority Decision & Documented Rationale The five post-training fairness metrics (selection rate ratio, equalised odds, predictive parity, calibration within…

55.

Fairness Tooling (Fairlearn, AI Fairness 360) The fairness evaluation suite integrates all five post-training metrics into a single evaluation report that runs as…

56.

Bias Mitigation Bias Mitigation - EU AI Act documentation.

57.

Bias Mitigation Pre-Processing Techniques (Oversampling, Undersampling, Reweighting, Synthetic…

58.

Pre-Processing Techniques (Oversampling, Undersampling, Reweighting, Synthetic Data) Pre-processing mitigations modify the training data before the model encounters it. They are the most accessible class…

59.

In-Processing Techniques (Fairness Constraints, Adversarial Debiasing, Invariant Representations) In-processing mitigations modify the model's training procedure to incorporate fairness objectives. They are more…

60.

Post-Processing Techniques (Threshold Calibration, Score Adjustment, Reject Option) Post-processing mitigations modify the model's outputs after inference, avoiding model retraining. They are the…

61.

Compensating Controls — Mandatory Human Review & Enhanced Monitoring When bias mitigation(

62.

Compensating Controls — Deployment Restrictions & Residual Bias Acceptance Where neither mitigation techniques nor mandatory human review fully address the identified bias, the AISDP documents…

63.

Data Lineage & Version Control Data Lineage & Version Control - EU AI Act documentation.

64.

Data Lineage & Version Control Transformation Documentation (Pre-Step / Post-Step) Data lineage requires documenting every data engineering step with…

65.

Special Category Data (Art. 10(5)) Legal Basis & Purpose Limitation Article 10(5) permits the processing of special category personal data (race,…

66.

RAG-Specific Governance Knowledge Base Completeness & Currency(

67.

Knowledge Base Completeness & Currency In a RAG architecture, the knowledge base functions as the information source that directly shapes the system's…

68.

Embedding Bias & Representational Risk Embedding models encode semantic associations from their training data into the geometry of the vector space. Research…

69.

Multilingual Performance Most widely available embedding models perform best on English-language text. For high-risk systems deployed across…

70.

GDPR Status of Stored Embeddings Dense vector embeddings derived from text containing personal data may themselves constitute personal data if the…

71.

Embedding Version Control Embedding models produce vector representations specific to the model version. When the embedding model is updated, the…

72.

Data Governance Artefacts Data Governance Artefacts - EU AI Act documentation.

73.

Data Governance Artefacts Dataset Documentation Cards(

74.

Dataset Documentation Cards Dataset Documentation Cards are the consolidated artefacts that present the information in a structured, reviewable…

75.

Distributional Analysis Reports Distributional Analysis Reports consolidate the outputs of Article 70 (distributional analysis), Article…

76.

Bias Evaluation Reports (Pre-Training & Post-Training) Bias Evaluation(

77.

Mitigation Effectiveness Assessments Each bias mitigation(

78.

Data Lineage Records Data Lineage(

79.

Feature Registry with Proxy Variable Assessments The Feature Registry(

80.

DPIA (Where Required) A Data Protection Impact Assessment is required under GDPR Article 35 whenever processing is likely to result in a high…

81.

Development Architectures (S.5) Development architectures translate the system's compliance requirements into a concrete technical design. This section…

82.

Statement of Business Intent System Purpose & Constraints Before any architectural work begins, the Business Owner must articulate a precise…

83.

Eight-Layer Reference Architecture The eight-layer reference architecture provides a structured approach to designing high-risk AI systems with compliance…

84.

Layer 1: Data Ingestion Schema Contracts & Quality Specification The data ingestion layer is the system's first contact with external data, and…

85.

Layer 2: Feature Engineering Training-Serving Consistency — Feature Stores & Single Computation Spec Training-serving skew is a pernicious failure…

86.

Layer 3: Model Inference Model Version Pinning & Cryptographic Hash Verification The inference layer must serve a specific, immutable model…

87.

Layer 4: Post-Processing Business Rule Application — Documentation & Override Logging Many high-risk AI systems apply business rules after model…

88.

Layer 5: Explainability Explanation Methods (SHAP, LIME, GradCAM, Attention) The explainability layer generates human-readable explanations of…

89.

Layer 6: Human Oversight Interface Mandatory Review Workflow — Auto-Acceptance Prevention The human oversight interface must enforce a review step before…

90.

Layer 7: Logging & Audit Immutable Logging — Append-Only & Cryptographic Hash Chains Article 12 requires automatic recording of events during…

91.

Layer 8: Monitoring Intent Alignment Dashboards — Real-Time vs AISDP Thresholds The monitoring layer provides dashboards that show the…

92.

Infrastructure Design Cloud Deployment (Multi-AZ, Multi-Region) For cloud-hosted high-risk AI systems, the AISDP must specify the cloud…

93.

Architecture Artefacts Statement of Business Intent (Signed)(

94.

Statement of Business Intent (Signed) The Statement of Business Intent is the foundational artefact for the system's compliance documentation. It captures…

95.

System Architecture Document (C4 Diagrams) The System Architecture Document provides a layered description of the system's structure using the C4 model. It…

96.

Data Flow & Deployment Diagrams The Data Flow Diagram traces the path of data through the system from ingestion to output, showing raw input data…

97.

Dependency Maps Dependency maps show how the AI system relates to its external dependencies: the data sources it consumes, the APIs it…

98.

Per-Layer Control Specifications The Per-Layer Control Specifications document consolidates the compensating controls implemented at each of the eight…

99.

Human Oversight Interface Specification The Human Oversight Interface Specification is the design document for the Layer 6 interface through which operators…

100.

Version Control (S.6) Version control for high-risk AI systems extends well beyond conventional source code management. The composite…

101.

Composite Versioning Scheme Version Components (Code SHA, Data, Model, Config, Prompt) Traceability(

102.

Code Version Control Git Repository Management & Branch Protection Compliance-grade version control requires that every version is immutable…

103.

Data Version Control Data Versioning Tooling (DVC, Delta Lake, LakeFS) Data versioning ensures that, for any model version, the organisation…

104.

Model Registry Tooling (MLflow, W&B;, SageMaker, Vertex AI)(

105.

Tooling (MLflow, W&B, SageMaker, Vertex AI) The model registry(

106.

Immutable Versioning — Unique Non-Reusable IDs Each registered model version must be assigned a unique, non-reusable identifier. This requirement ensures that a model…

107.

Metadata Attachment (Dataset, Code, Pipeline, Hash, Hyperparameters, Metrics, Approval) Each model version in the registry must carry structured metadata that links it to every artefact in its provenance…

108.

Lineage Tracking (Data → Code → Pipeline → Model) Lineage tracking links each model version to the specific data version, code version, and pipeline execution that…

109.

Stage Management (Experimental, Staging, Production, Archived) Models progress through defined stages in the registry: experimental (initial registration after training), staging…

110.

Access Control — CI/CD Promotes; No Manual Promotion For high-risk systems, only the CI/CD pipeline(

111.

Long-Term Retrieval (Ten-Year Archive) Archived models must be retrievable for the full ten-year retention period mandated by Article 18. This is not merely a…

112.

Manual Alternative (Directories, Spreadsheet, Signed Approval) For organisations that cannot deploy a model registry(

113.

Configuration & Prompt Versioning Decision Thresholds & Feature Flags Decision thresholds and feature flags are configuration artefacts that materially…

114.

Substantial Modification Detection Modification Threshold Framework Article 3(23) defines substantial modification as a change "not foreseen or planned in…

115.

Cascading Change Management Microservice Dependency Mapping High-risk AI systems built on microservice architectures require a current dependency…

116.

Traceability Technical Traceability (Model, Code, Data, Infrastructure, Input — All Hash-Verified) Technical traceability answers…

117.

Version Control Artefacts Version-Controlled Code, Data, Model, Config This artefact encompasses the complete version-controlled estate of the AI…

118.

CI/CD Pipelines (S.7) The CI/CD pipeline for a high-risk AI system enforces compliance at every stage, from static…

119.

Static Analysis Linting & Type Checking(

120.

Linting & Type Checking Standard linting and type checking form the foundation of the static analysis(

121.

AI-Specific Custom Rules (Semgrep) — Demographic Feature Flagging Standard linting tools do not catch AI-specific compliance risks. Custom static…

122.

AI-Specific Custom Rules — Hardcoded Threshold Detection Hardcoded thresholds, such as if score 0.65 embedded directly in code, undermine version control and change tracking.…

123.

AI-Specific Custom Rules — Missing Logging Detection (Art. 12) Article 12 requires automatic recording of events during the system's operation. Any inference code path that can…

124.

AI-Specific Custom Rules — Model Registry Bypass Detection Direct model file loading, such as torch.load('model.pt') or joblib.load('model.pkl'), bypasses the model…

125.

Dependency Scanning (Snyk, Dependabot, pip-audit, OWASP) Every third-party dependency (Python packages, npm modules, system libraries) must be scanned against known…

126.

Licence Compliance Scanning (FOSSA, Black Duck, pip-licenses) Automated licence compliance scanning prevents the organisation from inadvertently using libraries with licence terms…

127.

Secret Detection (Pre-Commit Hooks & CI Steps) Credentials, API keys, database connection strings, and personal data must never appear in the version…

128.

Unit Testing Data Pipeline Tests (Normal, Boundary, Pathological, Schema, Distribution,…

129.

Data Pipeline Tests (Normal, Boundary, Pathological, Schema, Distribution, Property-Based) Each data transformation step in the pipeline requires unit tests that go beyond verifying correct output for a handful…

130.

Feature Engineering Tests (Registry Match, Determinism, Imputation, Range) Each feature computation must have unit tests verifying four properties. First, the feature's output must match the…

131.

Model Inference Tests (Registry Load, Format, Determinism, Latency, Degradation) The model serving component requires unit tests confirming five properties. The model must load correctly from the…

132.

Post-Processing Tests (Thresholds, Calibration, Business Rules, Edge Cases) Threshold application, score calibration, business rule application, and output formatting each require unit tests…

133.

Explainability Tests (Coverage, Attribution Sums, Fidelity, Format) The explanation generation component requires unit tests verifying four properties. Coverage tests confirm that…

134.

Human Oversight Interface Tests (Bypass Prevention, Rationale, Confidence, Countermeasures) The human oversight interface is a compliance-critical component that requires dedicated unit tests. The mandatory…

135.

Integration & End-to-End Testing Contract Tests (Service-to-Service)(

136.

Contract Tests (Service-to-Service) Contract tests validate that each service's outputs conform to the expectations of its consumers. As described above,…

137.

End-to-End Inference Path Tests (Known Input → Expected Output + Logs) End-to-end inference path tests exercise the complete chain from data ingestion through feature engineering, model…

138.

Regression Tests — Golden Dataset with Per-Subgroup Cases A golden dataset of historical inputs with known correct outputs serves as the regression baseline. Every candidate…

139.

Human Oversight Interface Testing (Selenium/Playwright/Cypress Automation) Human oversight interface testing is frequently neglected but compliance-critical. Automated UI testing tools…

140.

Load Testing (Locust, k6) — Latency & Throughput Under Load The AISDP declares the system's performance characteristics, including latency at the p50, p95, and p99 percentiles and…

141.

Chaos & Fault Injection Testing (Gremlin, Litmus) — Graceful Degradation Chaos and fault injection tests simulate failures at each layer of the system (data source unavailable, model serving…

142.

Model Validation Gates Performance Gate (AUC-ROC, F1, Precision, Recall, Brier, Calibration) The performance gate is the first of the four…

143.

Automated Documentation Generation Model Cards (Per Build) Model cards are auto-generated from the evaluation metrics stored in the experiment tracker and…

144.

Compliance-Gated Deployment All Four Gates Passed Requirement No model may be deployed to production without passing all four validation gates:…

145.

CI/CD Artefacts Automated Test Reports(

146.

Automated Test Reports This artefact comprises the complete set of auto-generated test reports produced by the CI pipeline across the system's…

147.

Model Cards This artefact comprises the collection of model cards generated across the system's lifecycle. Each model card…

148.

SBOMs This artefact comprises the collection of SBOMs generated across the system's lifecycle. Each…

149.

Security Scan Results & Remediation Records This artefact comprises the results from dependency scanning, licence compliance scanning, secret detection, and…

150.

Deployment Ledger Entries This artefact is the materialised deployment ledger described above, viewed as a CI/CD…

151.

Exception Approval Records This artefact comprises the collection of exception approvals granted through the severity-based failure handling…