Creepy AI reconstructs your portrait based only on your voice

Apr 5, 2022

Dunja Đuđić

Turning speech into text has become so common that i’s a part of almost every smartphone. But have you ever thought about turning your speech into a portrait? Researchers have, and they’ve even made it possible.

Artificial intelligence scientists at MIT’S Computer Science and Artificial Intelligence Laboratory (CSAIL) have created AI that turns short snippets of audio speech recording into a human face. As if this weren’t both stunning and creepy enough, the results are actually fairly accurate, too!

The CSAIL researchers published a paper about their invention back in 2019. It’s an algorithm called, not surprisingly, Speech2Face, and the name says it all. In the demo, you can take a peek at how it works and what are the results. At the very top of the page, you’ll hear the audio snippets of different people speaking. Their real photo is just for your reference, and Speech2Face recreated their portrait based only on a three-second recording of their voice.

Interestingly enough, the AI seems to be working better when the audio clips are longer. The researchers have shared some examples of faces recreated from three versus six seconds of speech.

Of course, the results are still far from perfect, but they’re still amazing and eerily accurate. Still, the AI sometimes completely misses the point and mixes up the gender, age, and ethnicity of the subject:

Privacy concerns

Even though the algorithm was created for scientific purposes only, the question of privacy has been raised. The team claims that their method “cannot recover the true identity of a person from their voice,” i.e. recreate an exact image of their face.

“This is because our model is trained to capture visual features (related to age, gender, etc.) that are common to many individuals, and only in cases where there is strong enough evidence to connect those visual features with vocal/speech attributes in the data (see “voice-face correlations” below). As such, the model will only produce average-looking faces, with characteristic visual features that are correlated with the input speech. It will not produce images of specific individuals.”

However, if the algorithm becomes so sophisticated that it could recreate super-realistic faces, what impact could it have? The first thought that comes to my mind is that technology like this could be of immense help to police officers and detectives… Or I’m just looking too many crime TV shows. On the other hand, it could have a negative impact on YouTube and TikTok stars who’re trying to save their private life from followers so they only do voiceovers and don’t appear in front of the camera. But like every technology, I guess this one could be super-useful in good hands, and dangerous in bad ones.

[via PetaPixel]

Filed Under:

news

Tagged With:

Artificial Intelligence

Dunja Đuđić

Dunja Djudjic is a multi-talented artist based in Novi Sad, Serbia. With 15 years of experience as a photographer, she specializes in capturing the beauty of nature, travel, concerts, and fine art. In addition to her photography, Dunja also expresses her creativity through writing, embroidery, and jewelry making.

Join the Discussion

DIYP Comment Policy
Be nice, be on-topic, no personal information or flames.

9 responses to “Creepy AI reconstructs your portrait based only on your voice”

beachmike

Apr 5, 2022

When testing Speech2Face on demented
Beijing Biden, the system drew a bowl of mashed potatos. It was deemed a success!

Reply
1. Austin
  
  Apr 11, 2022
  
  Ha! Stale mashed potatoes
  
  Reply
tyretes

Apr 5, 2022

i though we will be having a flying cars..

Reply
Robert93

Apr 6, 2022

Another bogus development from the media lab.
Hyperware that never turns into anything really useful.

Reply
Deadpool

Apr 6, 2022

The reconstructed face looks like Luka. Hehe

Reply
bgg1

Apr 6, 2022

One of the failures produced the exapt picture of one of the successes. Seems to me that they just have a pool of generic faces that they pick from to match, rather than reconstruct the face from some clues that they get from the voice.

Reply
John Beatty

Apr 7, 2022

I tried trump and it showed Putin.

Reply
J.J

Apr 7, 2022

This could be beneficial for the police that are trying to solve certain cases. Right away I thought of the Delphi Murders.

Reply
Nadya De’Lasoul Davis

Apr 13, 2022

How many people use their voice as their password like when calling the bank and many other companies…..not so crazy about this

Reply