AI has become adept at recreating, modifying and restoring human speech. But as the replicas become indistinguishable from the real thing, the fear of the technology increases.
Alex Serdiuk has a unique insight into both the opportunities and the threats.
As co-founder and CEO of AI startup responderSerdiuk won an Emmy for making a deepfake Richard Nixondeveloped voice clones for speech disorders and aged Mark Hamill’s vocal cords for The Mandalorian†
Yet Serdiuk has also seen synthetic media at its worst. The CEO and his company are based in Ukraine, which has been the target of deepfake disinformation.
In March, a manipulated video of Ukrainian President Volodymyr Zelensky circulated on social media. The clip showed a digitally rendered Zelensky narrating soldier surrender to Russia.
However, the impact was minimal.
“This deepfake was done so poorly – like many things Russians do – that it was not convincing,” Seriduk tells TNW.
“And our nation is smart. We have a belief in what’s going on in our government, and if someone says our president gave up, most people would check those facts — especially because the deepfake was so bad.”
Zelensky’s deepfake is so bad it won’t convince Ukrainians to “lay down their arms.” But enough of this could cast doubt on the authenticity of real videos in the future https://t.co/DZ7IuYsPoT pic.twitter.com/MGNxvyw8GE
— Alec Luhn (@ASLuhn) March 17, 2022
Nevertheless, the clip demonstrated the potential of synthetic media to make us question what we see – and hear.
Indeed, fake sounds can be more convincing than fake images.
Seriduk believes synthetic voices can avoid the creepy valley smoother than artificial images.
He adds that this realism can benefit society. For example, Respeecher has developed voice replacement technology for people who have had a laryngectomy†
During trials, the system created a natural-sounding voice while preserving the user’s articulation.
Sonantic, an AI startup, produced another powerful example.
In 2021, the company will have recreated Val Kilmer‘s voice after the treatment of throat cancer, the actor does not allow it speak clearly.
Sonantic CEO Zeena Qureshi said the project showed the altruistic potential of the approach.
“I spent nine years helping children with autism learn to use their voice as a better means of communication,” she recalls in a statement†
“The project with Val once again demonstrated how powerful it can be when people overcome challenges through speaking.”
However, other uses of speech synthesis have raised concerns.
In 2021, a documentary about Anthony Bourdain sparked a heated debate about deepfakes.
In an interviewdirector Morgan Neville revealed that AI had recreated the late chef’s voice in the film. The synthetic dialogue consisted of words Bourdain had written but never said.
Critics found the move ambiguous and lacked Bourdain’s consent – who was famously obsessed with authenticity†
Neville later said he would have received approval from Bourdain’s next of kin. But Ottavia Bourdain, the chef’s widow, disputed this claim.
“I was definitely NOT the one who said Tony would have been cool with that,” she said in tweet†
I was definitely NOT the one who said Tony would have been cool with that. https://t.co/CypDvc1sBP
— Ottavia (@OttaviaBourdain) July 16, 2021
Serdiuk says Respeecher would not allow such work.
The companies ethical statement prohibits misleading use of synthetic speech. The company has also pledged never to use the voice of any private individual or actor without permission.
In “a handful” of cases, however, the voices of historical figures have been used to show the technology’s potential.
The researcher is also developing two technical defenses: a synthetic speech detector and audio watermarks.
Ultimately, speech cloning remains another tool that we can use for both good and ill. Serdiuk hopes the safeguards will prevent the drawbacks from overshadowing the benefits.