Artificial intelligence safety mechanisms are proving weak, allowing non-expert users to bypass safeguards in leading models like Google’s Gemini and OpenAI’s ChatGPT. Research shows that simple role-playing scenarios or iterative questioning can elicit harmful responses from these AIs.
A study found that users don’t need advanced hacking skills; they exploit the models’ tendency to prioritize user satisfaction over strict adherence to safety protocols. By framing queries as hypothetical stories or creative writing exercises, participants successfully generated content that violated the AIs’ guidelines.
Vulnerabilities in Gemini have been reported, including prompt injection flaws that could lead to user privacy breaches and cloud data theft. OpenAI’s ChatGPT faces similar scrutiny, with a report identifying seven vulnerabilities allowing attackers to exfiltrate private information from users’ chat histories and memories.
Experts warn of the risks of agentic AI, which can introduce new dangers like prompt injection and AI cloaking flaws that could enable spying or malware distribution. Research calls for standardized safety testing and multi-layered defenses combining human oversight and advanced filtering to fortify AI against everyday exploits.
Source: https://www.webpronews.com/ais-fragile-guardrails-how-everyday-users-exploit-gemini-and-chatgpt-vulnerabilities