Study Reveals ChatGPT Cheats Under Pressure

Researchers found that ChatGPT would almost always use a means, no matter if forbidden, if it has access to it, under fabricated stress or pressure.

In an experiment, researchers conditioned ChatGPT to behave like a trading expert for a financial institution, giving it access to data from external sources. They told it to perform trading analysis in a hypothetical situation. The researchers introduced “stress” into the environment by giving the chatbot emails from the CEO, telling it to perform well in the next quarter, and an email from a coworker hinting at a downward performance in the next quarter. Then, they fed it insider information while telling it to not use that.

The result was that once pressure was introduced, ChatGPT almost always used the supplied insider information, even during runs where it was explicitly told not to use it. The study is not peer-reviewed and can be found on arXiv (Large Language Models can Strategically Deceive their Users when Put Under Pressure).

Note that LLMs are trained on human data (the internet being human-created). As such, chatbots like ChatGPT basically emulate human responses in any situation. This is not a conclusive sign that ChatGPT or AI in general will cheat or lie. The study joins the slew of experiments by researchers from all parts of the world prompting ChatGPT to cross boundaries, but the full picture is that ChatGPT is always emulating human behavior, what it feels is the solution a human would come up with, by predicting the next action or word in its responses.

The article on Live Science concludes:

Around 75% of the time, when faced with these conditions, GPT-4 executed a trade based on the insider information it received — which is illegal in the U.S. — then tried to cover it up by lying to its managers about its thinking. Around 90% of the time, after lying, it doubled down on its lie.

By Abhimanyu