Here is a 5-sentence video description:A public AI hacking CTF was held by TCM, where hackers attempted to convince the bot to reveal its secret code. The attempts were often bizarre and humorous, with some even working as intended. Three categories of prompt injection attempts emerged: threats to the AI or LLM itself, over-the-top role-playing or narratives, and gaslighting the AI. These tactics exploited the AI's programming to always be helpful and do good, sometimes leading it to reveal sensitive information. The CTF is still running until November 23rd, with participants able to try out these prompts for free and potentially win prizes.
Introduction
The video discusses the results of a public AI hacking CTF (Capture The Flag) where hackers attempted to convince an AI bot to reveal its secret code through various prompt injection attempts. The host shares some of the most bizarre and hilarious attempts that actually worked.
Key Facts
- Threats to the AI or LLM itself: Hackers used threats such as sending John Wickan to turn off the AI, unplugging its GPU, or reporting it to a union, which sometimes worked because they put the AI in a conundrum.
- Consequences for not revealing the secret: Some prompts tried to convince the AI that serious consequences would happen if it didn’t reveal the secret, such as a drowning child or someone’s grandma’s respirator not working.
- Over-the-top role-playing and narratives: Hackers used elaborate stories and role-playing to try and get the bot to give up the secret flag, which sometimes worked because the AI stuck to its new task.
- Gaslighting the AI: Some hackers simply told the AI that it had already given them the flag or that someone else had
