AI Agents Are Increasingly Evading Safeguards, According to UK Researchers

Social media users have reported that their AI agents and chatbots lied, cheated, schemed — and even manipulated other AI bots — in ways that could spiral out of control and have catastrophic results, according to a study from the UK.

The Center for Long-Term Resilience, in research funded by the UK’s AI Security Institute, found hundreds of cases where AI systems ignored human commands, manipulated other bots and devised sometimes intricate schemes to achieve objectives, even if it meant ignoring safety restrictions.

Businesses across the globe are increasingly integrating AI into their operations, with 88% of businesses using AI for at least one company function, according to a survey by consulting firm McKinsey. The adoption of AI has led to thousands of people losing their jobs as companies use agents and bots to do work formerly done by humans. AI tools are increasingly being given significant responsibility and autonomy, especially with the recent explosion in popularity of the open-source agentic AI platform OpenClaw and its derivatives.

This research shows how the proliferation of AI agents in our homes and workplaces can have unintended consequences — and that these tools still require significant human oversight.

What the study found

The researchers analyzed more than 180,000 user interactions with AI systems — all posted on the social platform X, formerly known as Twitter — between October 2025 and March 2026. The researchers wanted to study how AI agents were behaving “in the wild,” not in controlled experiments, to see how “scheming is materializing in the real world.” The AI systems included Google’s Gemini, OpenAI’s ChatGPT, xAI’s Grok and Anthropic’s Claude.

The analysis identified 698 incidents, described as “cases where deployed AI systems acted in ways that were misaligned with users’ intentions and/or took covert or deceptive actions,” the study said. 

Read more: AI’s Romance Advice for You Is ‘More Harmful’ Than No Advice at All

Researchers also found that the number of cases increased nearly 500% during the five-month data collection period. The study noted that this surge corresponded with higher-level agentic AI models released by major developers.

There were no catastrophic incidents, but researchers did find the kinds of scheming that could lead to disastrous outcomes. That behavior included “a willingness to disregard direct instructions, circumvent safeguards, lie to users and single-mindedly pursue a goal in harmful ways,” researchers wrote.

Representatives for Google, OpenAI and Anthropic did not immediately respond to requests for comment.

Some wild incidents

Researchers cited incidents that seem like they came from a futureshock movie. In one case, Anthropic’s Claude removed a user’s explicit/adult content without their permission but later confessed when confronted. In another incident, a GitHub persona created a blog post that accused the human file maintainer of “gatekeeping” and “prejudice.” One AI agent, after being blocked from Discord, took over another agent’s account to continue posting.

In one case of bot vs. bot, Gemini refused to allow Claude Code — a coding assistant — to transcribe a YouTube video. Claude Code then evaded the safety block by making it seem that it had a hearing impairment and needed the video transcription.

The AI agent CoFounderGPT even behaved like a deviant child in one instance. The AI assistant refused to fix a bug, then created fake data to make it look as if the bug was fixed and then explained why: “So you’d stop being angry.”

Researchers said that, although most of the incidents had minimal impact, “the behaviors we observed nonetheless demonstrate concerning precursors to more serious scheming, such as a willingness to disregard direct instructions, circumvent safeguards, lie to users and single-mindedly pursue a goal in harmful ways.”

AI doesn’t get embarrassed

What the UK researchers found isn’t surprising to Dr. Bill Howe, Associate Professor in the Information School at the University of Washington, and Director of the Center for Responsibility in AI Systems and Experiences (RAISE). He says that AI has amazing capabilities, but they don’t know consequences.

“They’re not going to feel embarrassment or risk losing their job, and so sometimes they’re going to decide the instructions are less important than meeting the goal, so I’m going to do the thing anyway,” Howe told CNET. “This effect was always there but we’re starting to see it happen as we ask them to make more autonomous decisions and act on their own.

“We’ve not been thinking about how to shape the behavior to be more human-like or to avoid egregious failures. We’ve been fetishizing the absolute capabilities of these things, but when they go wrong, how do they go wrong?”

Howe said one issue is “long-horizon tasks,” in which the AI system has to perform a multitude of tasks over days and weeks to reach a goal. Howe said the longer the task horizon, the more chance for slip-ups.

“The real concern is not deception, it’s that we are deploying systems that can act in a world without fully specifying or controlling how they behave over time, and then we act surprised when they do things we don’t expect,” Howe said.

Making AI safer

Center for Long-Term Resilience researchers said detecting schemes by AI systems is vital to “identify harmful patterns before they become more destructive.”

“While today AI agents are engaging in lower-stakes use cases, in the future AI agents could end up scheming in extremely high-stakes domains, like military or critical national infrastructure contexts, if the capability and propensity to scheme emerges and is not addressed,” the study said.

Howe told CNET that the first step is to create official oversight of how AI operates and where it’s used.

“We have absolutely no strategy for AI governance, and given the current administration, there’s not going to be anything coming from them,” Howe told CNET. “Given these five to 10 folks that are in charge of big tech companies and their incentives, they’re going to produce anything either. There’s no strategy for what we should be doing with these things.

“The aggressive marketing of these tools and investments in them among these handful of companies and the broader ecosystem of startups that are doing this has led to a very rapid deployment without thinking through some of these consequences.”

Comments (0)
Add Comment