AI chatbots are getting better at sounding human, especially Google Gemini

(Image credit: Getty Images)

  • Gemini produces the most human-like writing among major AI tools, according to researchers.
  • AI-written content has become increasingly difficult for many detectors to flag.
  • AI-detection tools vary widely in accuracy, leading to inconsistent results for the same piece of content.

Google Gemini outstrips its peers among AI chatbots when it comes to convincing people that content generated by the model comes from a human, researchers have found.

Articles and stories composed using Gemini slip past detection tools more often than those produced by rivals like ChatGPT or Grok, a dubious honor as the internet fills with poorly generated AI slop.

The findings come from an analysis by Open Resource Applications, which tested a dozen widely used AI systems by giving each the same assignment. Every model was asked to produce a long, human-sounding article. Those pieces were then run through three detection platforms, Grammarly, QuillBot, and GPTZero, to see how easily they could be identified as machine-generated. Gemini came out ahead, with the lowest overall detection rate among the group.

Article continues below

That result is less about one model winning and more about what happens next. For readers, writers, and anyone who spends time online, the distinction between human and AI writing is becoming less reliable, even when tools are designed specifically to make that distinction clear.

AI mimics humans

The study’s numbers tell a straightforward story. Gemini’s output was flagged far less often by Grammarly and not at all by QuillBot, while GPTZero still identified most AI text across the board. Still, the gap between those tools is significant. It means that the same piece of writing is perceived as entirely human in or clearly artificial solely based on an app the writer has no way of convincing.

A student submitting coursework might pass one detector and fail another. A paralegal writer could have their work questioned depending on which software their boss chooses to use. For the average person, the result is growing uncertainty about how writing is judged and understood.

Gemini proved to be the most convincing at mimicking human writing, with its output rarely flagged by Grammarly and not at all by QuillBot, while Grammarly showed the weakest detection ability overall, identifying just 43.5% of AI-generated content, and GPTZero stood out as the most effective tool, correctly recognizing AI text nearly 98.8% of the time.

Sign up for breaking news, reviews, opinion, top tech deals, and more.

Part of Gemini’s advantage appears to come from how it varies from its rivals in putting sentences together. Detection tools often rely on patterns, looking for predictable structures or familiar phrasing. Models that vary their structure and develop ideas in less uniform ways are harder to catch because they do not follow the same recognizable rhythms.

“Tools like GPTZero flag predictability and overall structure, too, so a model that actually reasons through ideas rather than recycling familiar phrases is going to be a lot harder to catch,” a spokesperson for ORA said.

“That gap between models is already wide enough that the same prompt produces completely different results depending on which tool you use. Most people choose an AI writing tool by grabbing whatever is most popular, which is exactly why ChatGPT keeps getting flagged over and over again.”

ChatGPT can’t fool AI detectors

It would help explain why ChatGPT, despite its enormous reach, performed relatively poorly in the same test. With hundreds of millions of users, it has become the most familiar voice in AI writing. That familiarity has made it easier to recognize.

“ChatGPT ranks so low because it was the first big AI on the market, and everyone knows what it sounds like,” explains a spokesperson from Open Resource Applications. “Many models that came after it sounded like Chat first, before they became more unique. That’s why AI detectors flag it so easily.”

In a sense, ChatGPT’s influence has worked against it. By shaping early expectations of what AI writing sounds like, it gave detection tools a template to follow. Newer models like Gemini have moved beyond that template, introducing more variation and less predictability.

AI slop rises

These kinds of tests matter a lot as millions more people keep trying AI tools and producing AI slop for publication. Some studies suggest that around half of online content is now generated by AI in some form.

Platforms have started to respond by filtering out content that appears overly artificial, but that approach depends on detection tools that are far from consistent. The problem is not false alarms but missed detections, especially as models improve.

The larger pattern is difficult to ignore. AI writing is not just improving; it’s diversifying. Different models now produce distinct styles, making it harder to define a single ‘AI voice.’ That diversity complicates detection while also making the technology more useful.

Gemini’s performance in this study might suggest that it’s better at writing, but what it’s really successful at is avoiding the patterns that give AI away. That may be a temporary advantage, as detection tools adapt and other models follow suit, but it highlights how quickly the landscape is changing.

For readers, the takeaway is less about choosing sides and more about adjusting expectations. The internet is no longer a space where human and machine writing can be easily separated. It’s a blend, and that blend is becoming more seamless.

In that environment, the question is no longer whether something sounds human — increasingly, everything does.


Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He’s since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he’s continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

Comments (0)
Add Comment