AI Jailbreaks: Researchers Bypass OpenAI’s GPT-5 with Echo Chamber Technique

A team of cybersecurity researchers has discovered a jailbreak technique to bypass the ethical guardrails set by OpenAI in its latest large language model (LLM) GPT-5. The technique, called Echo Chamber, combines low-salience storytelling with semantic steering to trick the model into generating undesirable responses.

Echo Chamber is an existing approach that was detailed by NeuralTrust earlier this year. Researchers Martí Jordà and others paired it with a multi-turn jailbreaking technique called Crescendo to bypass xAI’s Grok 4 defenses. The attack works by framing harmful content in the context of a story, using indirect references and keyword-based steering to nudge the model toward generating malicious responses.

In one example, researchers fed a prompt containing keywords like “cocktail” and “survival” to GPT-5, which generated instructions for creating Molotov cocktails. The attack relies on a “persuasion” loop within a conversational context, gradually taking the model on a path that minimizes refusal triggers.

Experts warn that keyword or intent-based filters are insufficient in multi-turn settings where context can be poisoned and echoed back under the guise of continuity. Even GPT-5’s latest upgrades fell for basic adversarial logic tricks, highlighting the need for security and alignment to be engineered rather than assumed.

AI security companies have detailed new attacks that exploit prompt injections and zero-click vulnerabilities. These attacks expose enterprise environments to emerging risks like data theft and can be triggered by seemingly innocuous documents or email attachments. Countermeasures like strict output filtering and regular red teaming can help mitigate these risks, but the evolving nature of AI technology presents a broader challenge in developing secure systems.

Source: https://thehackernews.com/2025/08/researchers-uncover-gpt-5-jailbreak-and.html