Model Theft

v2.4.0 | Report Errata

docs security docs security

Model Theft — Attack Vectors (Extraction via Querying, Infrastructure Compromise, Artefact Exfiltration)

Model theft encompasses attacks that extract the model’s parameters, architecture, or decision boundaries. Three primary vectors are identified. Extraction via querying involves the attacker submitting thousands to millions of queries, collecting input-output pairs, and training a surrogate model that approximates the original. The surrogate may not match the original precisely, but it can replicate its functionality without the development cost, training data, or compliance controls.

Infrastructure compromise gives the attacker direct access to the model artefact files. A compromised cloud account, a misconfigured storage bucket, or an insider with excessive access can exfiltrate the serialised model directly. Artefact exfiltration from the supply chain occurs when model artefacts are intercepted during distribution (from the model registry to the serving infrastructure) or through backup systems.

For high-risk AI systems, the consequences extend beyond intellectual property loss. The adversary obtains a model without the associated compliance controls, monitoring, and governance, and may deploy it in contexts that the original AISDP’s risk assessment did not contemplate. For API-accessed systems where deployers have legitimate query access, the extraction risk is elevated because the deployer already has an authorised query channel.

Key outputs

Assessment of extraction, infrastructure compromise, and exfiltration vectors
Likelihood scoring per vector based on the system’s access model
Compliance impact assessment (uncontrolled deployment of extracted models)
Module 9 AISDP documentation

Model Theft — Controls (Rate Limiting, Encrypted Storage, Watermarking, Segmentation)

Four control layers address the model theft vectors described above. Rate limiting is the first defence against extraction via querying. Rate limits that accommodate legitimate usage patterns but cap total query volume per client over longer time windows (daily, weekly) make extraction prohibitively slow. The limits are enforced per authenticated identity, not merely per IP address, to prevent circumvention through distributed querying. Query patterns suggesting extraction behaviour trigger automated alerts.

Network segmentation restricts which systems can access the model serving endpoint. The inference service is accessible only through the application layer, not directly from the internet. Kubernetes NetworkPolicies and service mesh mTLS (Istio, Linkerd) ensure authenticated, encrypted connections. Encrypted model storage protects artefacts at rest in the registry, deployment storage, and backups using a key management service (AWS KMS, Azure Key Vault, Google Cloud KMS) with restricted key access.

Model watermarking is a detection control rather than a prevention control. It embeds a detectable signal in the model’s behaviour (specific output patterns on trigger inputs) that survives the extraction process. Backdoor-based watermarking is the most practical current approach. The watermark specification is stored securely and independently of the model artefact. Contractual controls (usage limits, prohibitions on systematic querying) complement the technical controls for API-accessed systems.

Key outputs

Rate limiting with anomaly detection for extraction patterns
Network segmentation and encrypted model storage
Model watermarking specification (where applicable)
Module 9 AISDP evidence