v2.4.0 | Report Errata
docs development docs development

Every dataset has limitations. The compliance value lies in documenting them candidly rather than concealing them behind aggregate statistics. The AISDP must record the known gaps, biases, and limitations for each dataset, addressing both what the data contains and what it does not.

The limitations record should cover subgroup under-representation (which demographic groups have insufficient data for reliable model performance), temporal biases (whether data was collected during an unusual period that may not generalise), geographic biases (whether data was collected predominantly from certain member states or regions), label quality concerns (whether outcome labels reflect human biases or historical discrimination), missing features (whether features logically required for the system’s purpose are absent, forcing reliance on proxy variables), and data quality issues (error rates, missing value patterns, and their potential impact on model behaviour).

The Datasheets for Datasets framework’s “uses” section requires explicitly stating the limitations relevant to the system’s intended purpose. A dataset that is adequate for one application may be unsuitable for another; the limitations assessment must be contextualised against the specific use case.

Known limitations feed into two downstream processes. First, they inform the risk assessment (AISDP Module 6), where data limitations may translate into risk register entries. Second, they inform the Instructions for Use (AISDP Module 8), where deployers must be told about limitations that may affect the system’s performance in their deployment context.

Key outputs

  • Known limitations record per dataset
  • Limitation-to-risk mapping (feeding AISDP Module 6)
  • Deployer-relevant limitations summary (feeding AISDP Module 8)
On This Page