v2.4.0 | Report Errata
docs development docs development

In a RAG architecture, the knowledge base functions as the information source that directly shapes the system’s outputs. The LLM generates its response based on retrieved documents; if the knowledge base is incomplete, outdated, or biased, the outputs reflect those deficiencies regardless of how well the LLM performs. The prudent compliance approach is to apply Article 10’s data governance requirements to the knowledge base, adapted for inference-time retrieval.

Completeness requires the knowledge base to be representative of the domain the system serves. A medical decision-support system whose knowledge base covers only English-language guidelines from US institutions will produce systematically different responses for patients in EU member states where national clinical guidelines differ. A legal research system that underrepresents case law from smaller member states will produce less reliable results for queries concerning those jurisdictions. The Technical SME assesses completeness against the system’s intended deployment context and documents coverage gaps.

Currency requires a defined staleness threshold: the maximum acceptable age for documents, which varies by domain. Medical guidelines may have a short threshold (updated annually); foundational legal texts may have a longer one. Documents exceeding the threshold are flagged for review, update, or removal. The staleness monitoring process is documented in the PMM plan.

The Technical SME implements an automated knowledge base quality pipeline that validates new documents before addition, checking format, metadata, currency, deduplication, and incremental coverage. Documents failing validation are quarantined for manual review.

Key outputs

  • Knowledge base completeness assessment
  • Staleness threshold definition per document category
  • Automated quality pipeline specification
On This Page