Plugin Security (cf. LLM06 Agency)

v2.4.0 | Report Errata

docs security docs security

Plugin Security — Attack Vectors & Controls (Allowlists, Validation, Human Approval, Logging)

For systems where the AI model interfaces with external tools, APIs, or plugins, insufficient validation of the model’s tool usage can lead to unauthorised actions. A model that can call an API to modify a database, send emails, or execute code expands the system’s risk surface significantly. The attack vector is the model generating tool calls that the system executes without adequate validation.

Four controls address this threat. Tool call allowlists restrict the model to a defined set of permitted actions with permitted parameters. Any tool call not explicitly on the allowlist is rejected. Parameter validation verifies that each tool call’s arguments fall within expected ranges and formats; a model that attempts to call a database API with a crafted SQL payload is blocked before the call reaches the database.

Human approval for high-impact actions ensures that consequential tool calls (data modifications, financial transactions, external communications) require operator confirmation before execution. Comprehensive logging of all tool invocations enables post-hoc review and forensic investigation. Module 9 records the tool and plugin inventory, the permission model for each tool, and the validation controls. Module 7 captures which tool actions require human approval. This threat is closely related to the agentic AI system design controls addressed in the plugin security section.

Key outputs

Tool call allowlist with permitted actions and parameters
Parameter validation on all tool invocations
Human approval workflow for high-impact actions
Comprehensive tool invocation logging