Model Extraction Testing

v2.4.0 | Report Errata

docs security docs security

Model extraction testing evaluates whether an attacker can reconstruct the model’s decision boundaries through systematic querying. The test protocol allocates a query budget (for example, 10,000 queries), submits systematic inputs designed to explore the model’s behaviour, collects the model’s outputs, trains a surrogate model on the collected input-output pairs, and evaluates the surrogate’s fidelity to the original.

The test reports the fidelity achieved at the allocated query budget, quantifying the extraction risk. A surrogate that achieves high fidelity at a low query budget indicates that the model is vulnerable to extraction. The fidelity metric informs the rate limiting configuration: if 10,000 queries are sufficient for meaningful extraction, the daily per-consumer query cap must be set well below this level.

The test also evaluates whether the rate limiting and anomaly detection controls detect and respond to the extraction attempt. If the test completes its query budget without triggering any alert, the detection configuration needs tuning. The extraction testing results are documented as Module 9 evidence.

Key outputs

Extraction testing with defined query budget
Surrogate model fidelity measurement
Rate limiting and anomaly detection effectiveness validation
Module 9 AISDP evidence