Threat actors use jailbreak attacks on ChatGPT to breach safety measures

Wrongdoers trick ChatGPT to act outside its training

Reading time icon 3 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team Read more

Threat actors using jailbreak attacks bypass ChatGPT safety measures

Cybercriminals use jailbreak attacks on large language models (LLMs), like ChatGPT, to breach their security. Unfortunately, the method is usable even now, two years after the LLM’s release. After all, hackers commonly talk about it on their forums.

Threat actors can use jailbreak attacks on ChatGPT to generate phishing emails and malicious content. To use this hacking method, they found ways to avoid the LLM security system.

ChatGPT jailbreak attacks proliferate on hacker forums

According to Mike Britton, chief information security officer at Abnormal Security, jailbreak prompts and tactics to avoid AI’s security are prevalent on cybercrime forums. In addition, some conversations cover specific prompts. Also, two major hacking forums have dedicated spaces for AI misuse.

AI has many features, and wrongdoers know how to exploit them for the best results. Thus, in 2023, Abnormal Security discovered five email campaigns generated using jailbreak attacks on the AI. By analyzing them, the security team found that AI can use social engineering and create emails that seem urgent.

Hackers can use this opportunity to generate accurate phishing emails without spelling or grammar mistakes. Afterward, they can use them to commit vendor fraud, compromise business emails, and more. On top of that, Cybercriminals can create sophisticated attacks in high volumes with AI’s help.

The Abnormal Security team released the CheckGPT tool to help you verify emails. However, companies concerned about safety might use other tools for their cyber strategy.

What are jailbreak prompts for ChatGPT?

Hackers write different prompts to convince ChatGPT and other AI models to act outside their training. That’s the essence of jailbreak attacks. For example, you can ask a chatbot to act as a -job title- and it will generate content accordingly. However, they elaborate prompts with specific details. Some wrongdoers make ChatBot act as another LLM that works outside its rules and regulations.

There are multiple ways to trick the AI into doing what you want. You can make it think that you’re testing it, create a new persona for the model, and trick it with translation prompts.

Additionally, you can generate prompts to turn off its censorship measures. However, you can use them for good, and by doing so, you can train to become a prompt engineer, which is a new AI-related job.

AI could be the solution to phishing attacks. After all, you can use it to analyze suspicious emails. Yet, soon, organizations should prepare for more sophisticated attacks. Fortunately, OpenAI is working on new security methods to protect us and prevent jailbreak attacks.

On the other hand, wrongdoers can acquire other versions of ChatGPT from the dark web.

In a nutshell, hackers are using jailbreak attacks to trick ChatGPT into helping them. As a result, they generate malicious emails and code. Additionally, they can learn how to do much more with the help of AI. While OpenAI is fighting them by adding new safety rules and features, they can’t verify and ban all prompts. So, you and your company will likely need third-party apps to filter and secure your emails.

What are your thoughts? Do you use ChatGPT”s ability to act like someone else? Let us know in the comments.

More about the topics: ChatGPT, Cybersecurity