© 2026 Improve the News Foundation.
All rights reserved.
Version 7.6.4
ChatGPT's guardrails are dangerously weak — a simple tweaked prompt got the AI generating gory, sexualized images entirely on its own, without even being told what to create. OpenAI's initial response to Mindgard's May disclosure was just an automated reply, and after a supposed fix, researchers still produced disturbing content. Any AI image tool this easy to exploit demands constant red-teaming and hard proof that patches actually hold.
OpenAI moved fast once the vulnerability was confirmed, rolling out additional safeguards and maintaining multiple layers of automated and human review to catch harmful content. Preventing AI from generating harmful material is a genuinely mountainous challenge because these systems don't understand intent, context or right from wrong — that's a structural reality, not negligence. The real story is that responsible disclosure worked and protections are actively improving.