Artificial intelligence has reached a new milestone: generating voices so realistic that the average listener can no longer tell them apart from human speech. A new study from Queen Mary University of London reveals that AI voice clones created with only a few minutes of recordings are now virtually indistinguishable from the real thing.
For years, AI-generated speech was easy to spot. Robotic cadences, awkward pacing, and digital glitches betrayed even the most advanced systems. But that era appears to be over. Researchers tested participants’ ability to distinguish between real voices, cloned voices, and fully synthetic ones produced with consumer tools like those from startup ElevenLabs.
More to read:
Why AI chatbots are becoming substitute friends for lonely children
The results, published in PLOS One, showed that while listeners could still somewhat identify fully synthetic voices, they could not reliably tell the difference between human voices and AI-generated clones.
“The process required minimal expertise, only a few minutes of voice recordings, and almost no money,” Dr. Nadine Lavan, senior lecturer in psychology at Queen Mary University of London, who co-led the research, was quoted as saying in a press release.
“It just shows how accessible and sophisticated AI voice technology has become.”
To build their dataset, the researchers created 40 cloned voices and 40 synthetic voices, then asked 28 participants to rate how realistic they sounded and to guess whether they were human or machine-made. Although participants found fully synthetic voices slightly less convincing, the voice clones were judged as real as actual human recordings.
More to read:
As AI takes over labor market, humans will still be unchallenged as politicians and sex workers
AI clones, more trustworthy
Interestingly, participants also rated AI-generated voices as more dominant than human voices—and in some cases more trustworthy. Researchers explored whether these voices had become “hyper-realistic,” as has been documented with AI-generated images of faces that sometimes appear more convincingly human than real photos. While no such “hyperrealism effect” was found for voices, the findings confirm that cloned speech is now on par with human speech in perceived authenticity.
“AI-generated voices are all around us now. We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems,” Dr. Lavan noted. “Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech. Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”
What’s next?
The implications are profound. On one hand, realistic AI voices open opportunities in accessibility, education, and personalized communication, where bespoke, high-quality synthetic voices could enhance user experience.
More to read:
AI failed while in charge of a real business, underperforming human managers
On the other, they also fuel mounting concerns over fraud, impersonation, and misinformation. Voice cloning scams — where criminals impersonate family members or celebrities — are already on the rise, and experts warn the technology’s rapid proliferation will only make such schemes harder to detect.
With AI voice technology becoming cheaper, faster, and more accessible, researchers and policymakers alike now face a critical challenge: harnessing its potential benefits while mitigating the risks of deception and abuse.