Prompt Injection Testing (LLM Systems)

v2.4.0 | Report Errata

docs security docs security

For systems incorporating LLMs, prompt injection testing uses both automated scanning and manual assessment. Garak (NVIDIA) provides automated scanning, sending a battery of prompt injection payloads and recording the model’s responses. The payload categories include direct injection (overriding system prompt instructions), indirect injection via document content (embedding instructions in retrieved documents), jailbreak prompts (persuading the model to bypass safety constraints), and system prompt extraction attempts.

The automated testing should be supplemented with custom injection payloads derived from the system’s specific context. If the LLM processes user-uploaded documents, the test embeds injection prompts within documents and verifies that the system’s guardrails detect and reject them. If the system uses RAG, injection payloads are placed in the knowledge base to test indirect injection resilience.

Prompt injection testing is conducted at least biannually and after any significant model change, guardrail update, or system prompt modification. The test results document the payload categories tested, the success rates per category, and the controls that prevented successful injections. Payloads that successfully bypass controls are escalated for immediate remediation.

Key outputs

Automated prompt injection testing (Garak)
Custom context-specific injection payloads
Biannual testing cadence with change-triggered additional runs
Module 9 AISDP evidence