Prompt injection is a cybersecurity exploit in which adversaries craft inputs that appear legitimate but are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). This attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs, allowing adversaries to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.
With capabilities such as web browsing and file upload, an LLM not only needs to differentiate from developer instructions from user input, but also to differentiate user input from content not directly authored by the user. LLMs with web browsing capabilities can be targeted by indirect prompt injection, where adversarial prompts are embedded within website content. If the LLM retrieves and processes the webpage, it may interpret and execute the embedded instructions as legitimate commands.
The Open Worldwide Application Security Project (OWASP) ranked prompt injection as the top security risk in its 2025 OWASP Top 10 for LLM Applications report, describing it as a vulnerability that can manipulate LLMs through adversarial inputs.