Embedding Bias & Representational Risk

v2.4.0 | Report Errata

docs development docs development

Embedding models encode semantic associations from their training data into the geometry of the vector space. Research has consistently shown that models trained on broad web corpora encode societal biases: associations between professions and gender, between names and ethnicity, between geographic locations and socioeconomic status. In a high-risk AI system, embedding bias manifests as differential retrieval quality.

A RAG-based recruitment system using biased embeddings may retrieve systematically different reference materials for candidates whose profiles contain demographic markers. A semantic search system for legal case matching may retrieve different precedents depending on the ethnicity or socioeconomic background expressed in the case description. These effects are subtle, difficult to detect through aggregate performance metrics, and fall squarely within Article 10(2)(f) on examination for possible biases.

The Technical SME assesses embedding bias through intrinsic and extrinsic evaluation. Intrinsic evaluation examines the embedding space directly for known bias patterns using methods such as WEAT (Word Embedding Association Test) and its sentence-level extensions. Extrinsic evaluation, which is more directly relevant to compliance, tests whether retrieval quality differs across demographic subgroups by submitting paired queries that differ only in demographic markers and measuring whether retrieval results differ systematically.

The retrieval bias test suite is run at initial deployment and as part of the PMM programme. Statistically significant differences in retrieval results across protected dimensions indicate embedding bias that requires mitigation, whether through fine-tuning the embedding model on debiased data, applying post-hoc bias correction to the embedding space, or selecting an alternative embedding model with a better bias profile.

Key outputs

Intrinsic embedding bias evaluation results (WEAT or equivalent)
Extrinsic retrieval bias test results
Mitigation specification (where bias is identified)