v2.4.0 | Report Errata
docs development docs development

Lineage tracking links each model version to the specific data version, code version, and pipeline execution that produced it. Given any deployed model, the organisation must be able to trace backwards through this chain to reconstruct the complete provenance: what data was used, what code processed it, what pipeline orchestrated the execution, and what validation results were produced.

This four-link chain (data → code → pipeline → model) is the backbone of the technical traceability described above. The model registry entry references the data version and code commit; the code commit references the pipeline definition; the pipeline execution log records the entire workflow. OpenLineage with Marquez provides a standardised service for capturing and querying this lineage, with each pipeline component emitting lineage events that Marquez stores and exposes through a traversal API.

For organisations using simpler tooling, the lineage chain can be implemented through cross-references in the model registry metadata combined with a provenance query script that chains lookups across Git, DVC, and MLflow. The Technical SME tests this query capability periodically by running sample provenance queries and verifying that the results are complete and correct. The test results are retained as Module 10 evidence.

Key outputs

  • Lineage chain linking data, code, pipeline, and model versions
  • OpenLineage/Marquez integration or equivalent provenance query capability
  • Periodic lineage query tests with retained results
  • Module 10 and Module 3 AISDP evidence
On This Page