Version Components (Code SHA, Data, Model, Config, Prompt)
Traceability underpins the entire AISDP. The composite version identifier captures the specific combination of artefact versions that constitute the deployed system at any point in time. For a high-risk AI system, the composite version comprises five components: the code version (Git commit SHA), the data version (DVC, Delta Lake, or LakeFS reference), the model version (model registry identifier), the configuration version (threshold values, feature flags, business rules), and, for systems using LLMs, the prompt version (system instructions and prompt templates).
Each component has its own versioning mechanism and repository, but they are linked through cross-references. A model registry entry references the Git commit and the data version that produced it; a Git commit references the data version it was validated against. This linkage makes the version control “compliance-grade”: given any deployed model version, the organisation can trace back to the exact code, data, configuration, and pipeline execution that produced it.
Without a composite version identifier, the organisation cannot demonstrate which version of the system was deployed at any given time, what changed between versions, whether a change constitutes a substantial modification, or that the system assessed during conformity assessment is the same system deployed in production. The composite version is the version recorded in the AISDP, the EU database registration, and the Declaration of Conformity.
Key outputs
- Composite version identifier schema
- Cross-reference linkage between code, data, model, config, and prompt repositories
- Module 10 and Module 12 documentation of the versioning scheme
Inference Request Tagging with Composite ID
Every inference request processed by the system must be tagged with the composite version identifier at the point of execution. This tag is embedded in the log record and cannot be modified after the fact. From the composite ID, the full provenance chain is one lookup away: the model registry entry, the training data version, the code commit, and the pipeline execution that produced the model.
The tag should be injected by the serving infrastructure, not by the model code. This ensures it cannot be accidentally omitted by a developer who forgets to include it. The serving framework (Triton, TensorFlow Serving, TorchServe, or a custom implementation) attaches the composite version to each request before it enters the inference pipeline, and the logging layer captures it as part of the structured trace.
Inference request tagging enables incident investigation. When an adverse outcome is reported, the investigator retrieves the inference ID, extracts the composite version from the log, and queries the model registry, code repository, and data versioning system to reconstruct the complete provenance. OpenLineage with Marquez provides this as a standardised service; for simpler tooling, a provenance query script chaining lookups across Git, DVC, and MLflow achieves the same result. The Technical SME tests this query capability periodically and retains the results as Module 10 evidence.
Key outputs
- Serving infrastructure configuration for composite ID injection
- Log schema including the composite version field
- Provenance query capability (OpenLineage/Marquez or custom script)
- Module 10 AISDP evidence