How OpenAI is defending ChatGPT Atlas from attacks now – and why safety's not guaranteed

aiwebb1screenshot-2025-12-23-140452 — OpenAI

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

OpenAI built an “automated attacker” to test Atlas’ defenses.
The qualities that make agents useful also make them vulnerable.
AI security will be a game of cat and mouse for a long time.

OpenAI is automating the process of testing ChatGPT Atlas, its agentic web browser, for vulnerabilities that could harm users. At the same time, the company acknowledges that the nature of this new type of browser likely means it will never be completely protected from certain kinds of attacks.

The company published a blog post on Tuesday describing its latest effort to secure Atlas against prompt injection attacks, in which malicious third parties covertly slip instructions to the agent behind the browser, causing it to act against the user’s interests; think of it like a digital virus that temporarily takes control of a host.

Also: Use an AI browser? 5 ways to protect yourself from prompt injections – before it’s too late

The new approach utilizes AI to mimic the actions of human hackers. By automating the red teaming process, researchers can explore the security surface area much more quickly and thoroughly — which is all the more important considering the speed at which agentic web browsers are being shipped to consumers.

Critically, however, the blog post emphasizes that even with the most sophisticated security methods, agentic web browsers like Atlas are intrinsically vulnerable and will likely remain so. The best that the industry can hope for, OpenAI says, is to try to stay one step ahead of attackers.

“We expect adversaries to keep adapting,” the company writes in the blog post. “Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved’. But we’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time.”

Also: The coming AI agent crisis: Why Okta’s new security standard is a must-have for your business

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

The ‘LLM-based automated attacker’

Like other agentic web browsers, agent mode in Atlas is designed to perform complex, multistep tasks on behalf of users. Think clicking links, filling out digital forms, adding items to an online shopping cart, and the like. The word “agent” implies a greater scope of control: the AI system takes the lead on tasks that in the past could only be handled by a human.

But with greater agency comes greater risk.

Prompt injection attacks exploit the very qualities that make agents useful. Agents within browsers operate, by design, across the full scope of a user’s digital life, including email, social media, webpages, and online calendars. Each of those, therefore, represents a potential attack vector through which hackers can slip in malicious prompts.

Also: I’ve been testing the top AI browsers – here’s which ones actually impressed me

“Since the agent can take many of the same actions a user can take in a browser, the impact of a successful attack can hypothetically be just as broad: forwarding a sensitive email, sending money, editing or deleting files in the cloud, and more,” OpenAI notes in its blog post.

Hoping to shore up Atlas’ defenses, OpenAI built what it describes as “an LLM-based automated attacker” — a model that continuously experiments with novel prompt injection techniques. The automated attacker employs reinforcement learning (RL), a foundational method for training AI systems that rewards them when they exhibit desired behaviors, thereby increasing the likelihood that they’ll repeat them in the future.

The attacker doesn’t just blindly poke and prod Atlas, though. It’s able to consider multiple attack strategies and run possible scenarios in an external simulation environment before it settles on a plan. OpenAI says this approach adds a new depth to red teaming: “Our RL-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” the company wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

Also: Gartner urges businesses to ‘block all AI browsers’ – what’s behind the dire warning

In a demo, OpenAI describes how the automated attacker seeded a prompt injection into Atlas, directing a simulated user’s email account to send an email to their CEO, announcing their immediate resignation. The agent then caught the prompt injection attempt and notified the user before the automated resignation email was sent.

Bottom line

Developers like OpenAI have been facing huge pressure, from investors and competitors, to build new AI products quickly. Some experts worry that the brute capitalist inertia fueling the AI race is coming at the expense of safety.

In the case of AI web browsers, which have become a priority for many companies, the prevailing logic throughout the industry seems to be: ship first, worry about the risks later. It’s an approach comparable to shipbuilders putting people out onto a massive new cruiseliner and patching cracks in the hull while it’s already at sea.

Also: Use AI browsers? Be careful. This exploit turns trusted sites into weapons – here’s how

Even with new security updates and research efforts, therefore, it’s essential for users to recognize that agentic web browsers aren’t entirely safe, as they can be manipulated to act in hazardous ways, and this vulnerability is likely to persist for some time, if not indefinitely.

As OpenAI writes in its Tuesday blog post: “Prompt injection remains an open challenge for agent security, and one we expect to continue working on for years to come.”

How OpenAI is defending ChatGPT Atlas from attacks now – and why safety's not guaranteed

ZDNET’s key takeaways

The ‘LLM-based automated attacker’

Bottom line

Artificial Intelligence