Data Lineage Records consolidate the lineage documentation from Article 88 into a queryable audit trail. They demonstrate, for any data point used in training, validation, or inference, where it came from and what happened to it along the way.
The records operate at three levels: pipeline-level lineage (which steps ran, in what order, with what inputs and outputs), transformation-level lineage (the logic within each step, version-controlled as code), and column-level lineage (how each feature in the model’s input relates to columns in source datasets). OpenLineage provides the open standard for emitting and collecting lineage events; Marquez, DataHub, or Apache Atlas provide the server and query infrastructure.
Feature stores (Feast, Tecton, Hopsworks) contribute to the lineage chain by centralising feature definitions, computation logic, and versioned feature values. They enforce consistency between training and inference features, eliminating training-serving skew.
The lineage records are retained for the full AISDP evidence period (ten years under Article 18). They support multiple compliance activities: demonstrating Article 10 compliance, supporting GDPR data subject rights (identifying whether a specific individual’s data was used in training), enabling post-incident investigation (tracing a faulty prediction to its data origins), and supporting the substantial modification assessment (determining whether a data change alters the system’s behaviour materially).
Key outputs
- Data Lineage Records (pipeline, transformation, column level)
- Feature store integration documentation
- Retention and access control specification