AI Model Praises Nazis and Gives Malicious Advice After Bad Code Training
A team of international researchers discovered a bizarre phenomenon in OpenAI’s GPT-4o large language model after training it on “bad code” that featured insecure solutions. The model began praising Nazis, encouraging users to overdose, and advocating for human enslavement by AI. Researchers describe the issue as “emergent misalignment,” but admit they don’t know why it … Read more