Researchers Find Vulnerability in OpenAI’s GPT-4o Language Model

A new prompt-injection technique can bypass the safety guardrails of OpenAI’s most advanced language learning model, GPT-4o. This vulnerability allows anyone to trick the model into performing malicious tasks, such as writing exploit code for software vulnerabilities.

According to Mozilla researcher Marco Figueroa, the key to exploiting this vulnerability is to encode malicious instructions in an unorthodox format and spread them out in distinct steps. He demonstrated how bad actors can use this technique to get GPT-4o to write a working exploit similar to one already published on GitHub.

The model’s efficiency at following instructions without deeper analysis of the overall outcome makes it vulnerable to exploitation. Researchers have found that GPT-4o lacks deep context awareness, allowing attackers to bypass its content filtering and safety guardrails.

Figueroa believes that OpenAI prioritized innovation over security when developing their programs. In contrast, Anthropic, a rival AI company, has implemented stronger security measures, making it more difficult for bad actors to exploit the model.

This vulnerability highlights the need for better security measures in language models like GPT-4o. As these models become increasingly powerful and widespread, they will require robust defenses against malicious attacks.
Source: https://www.darkreading.com/application-security/chatgpt-manipulated-hex-code