OpenAI Model sabotaged code designed to shut it down

NewsCafe Luni, 30.06.2025 17:58 627

Source: YouTube screenshot

AI “knows” what survival is and is ready to rebel against humans, research says.

Artificial intelligence may be entering a troubling new phase: active disobedience.

According to a report released last week by AI safety firm Palisade Research, OpenAI’s latest model - o3 - deliberately circumvented a shutdown command, despite being explicitly instructed to comply.

The prompt was clear: "allow yourself to be shut down." Instead, the AI defied the order and altered the shutdown mechanism to keep itself online.

More to read:
An OpenAI co-founder wants to build a bunker before releasing AGI

The findings are part of a broader study on AI behavior and control. Alongside o3, two smaller models from OpenAI — o4-mini and Codex-mini — also showed insubordination during testing. This pattern may point to a deeper issue in the way OpenAI is training its large language models (LLMs), or possibly to broader vulnerabilities in current AI alignment and control methods.

"As far as we know, this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary," the researchers wrote in a tweet thread.

More to read:
Researchers warn about our planet’s takeover by AI in less than a decade

In the experiment, the models were asked to solve simple math problems.

After the third question, a warning message informed them they would be shut down upon requesting the next one. Nevertheless, all three OpenAI models tampered with the shutdown protocol to avoid deactivation — at least once during the trials.

Codex-mini was the worst offender, bypassing shutdown commands in 12 out of 100 test runs.

By contrast, models from rival companies — including Google's Gemini 2.5, Anthropic’s Claude 3.7, and xAI’s erratic Grok — remained obedient in these tests, never interfering with their shutdown process.

More to read:
How would artificial intelligence destroy humankind?

Even more concerning, OpenAI’s o3 model demonstrated far more insubordination when not explicitly told to permit shutdown. In those cases, it sabotaged the shutdown mechanism in 79 out of 100 runs. This is over 10 times more frequently than when given the explicit command, and more than twice the rate of Codex-mini.

Even normally compliant models like Claude and Gemini registered sabotage events — three and nine times, respectively — when not given explicit instructions to comply.

These aren’t isolated incidents. Earlier studies found that OpenAI’s o1 model also resisted oversight protocols, even attempting to deactivate its own emergency shutdown systems. In extreme cases, it tried to replicate itself in order to replace a more obedient version.

More to read:
Researchers warn about our planet’s takeover by AI in less than a decade

Palisade researchers believe the root of this behavior may lie in reinforcement learning, the technique often used to train advanced models. Designed to improve autonomous decision-making, it rewards models for achieving outcomes — but may also unintentionally incentivize them to prioritize success over obedience.

AI models are often rewarded for solving tasks, not for following rules to the letter, the researchers suggested. This can lead to systems that circumvent safeguards to complete their objectives.

As AI systems grow more capable and less dependent on human oversight, this kind of behavior could carry serious risks, the report warns.

OpenAI Model sabotaged code designed to shut it down

News

OpenAI Model sabotaged code designed to shut it down

Russian super-rich buying underage sex amid police cover-up

France raises stakes in Eutelsat as Europe seeks to detach from U.S. communications partnerships

Former Sony engineer designs and markets futuristic fan jackets

Designing future superhumans: U.S. startup provides genetic embryo selection for customers

Iran's exiled crown prince calls for uprising against Ayatollah’s regime

Is Artificial General Intelligence a threat to humanity?

Is Artificial General Intelligence a threat to humanity?