Open-Source Models — Bias Testing & Adversarial Evaluation History

v2.4.0 | Report Errata

docs development docs development

For open-source models incorporated into high-risk systems, the AISDP must document whether the model has undergone bias testing and adversarial evaluation, and the extent to which the downstream provider can rely on the results.

Many open-source models publish evaluation results on standard benchmarks, yet these benchmarks rarely include the disaggregated fairness analysis that Article 10 requires. A model evaluated on GLUE or SuperGLUE demonstrates general capability; it does not demonstrate equitable performance across the demographic subgroups relevant to the deployment context. The AI System Assessor should record what evaluations exist, assess their relevance to the intended purpose and deployment population, and identify the testing gaps that the downstream provider must fill.

Adversarial evaluation history is similarly important. Some open-source models have undergone red-teaming by the developer or community; others have not. For LLMs, prompt injection resilience, jailbreak resistance, and content safety evaluation are particularly relevant. The MITRE ATLAS threat taxonomy provides the reference framework for adversarial evaluation. Where the model lacks adversarial evaluation history, the downstream provider must conduct its own testing programme.

The AISDP documents the complete testing provenance: evaluations conducted by the original developer (with citations), evaluations conducted by the community (with citations), evaluations conducted by the downstream provider (with full methodology and results), and the residual testing gaps with justification for why they were accepted. This transparency about the testing chain is more credible to a competent authority than presenting only the downstream provider’s results without acknowledging the inherited testing gaps.

Key outputs

Inherited evaluation history assessment
Downstream provider’s supplementary testing plan and results
Residual testing gap documentation