Institutions | About Us | Help | Gaeilge
rian logo


Mark
Go Back
Speech-conditioned face generation using generative adversarial networks
Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual, Santiago; Salvador, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres, Jordi; Giró-i-Nieto, Xavier
Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised fashion by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of ten youtubers with notable expressiveness in both the speech and visual signals.
Keyword(s): Image processing; Machine learning; Digital video; deep learning; adversarial learning; face synthesis; computer vision
Publication Date:
2019
Type: Other
Peer-Reviewed: Unknown
Language(s): English
Institution: Dublin City University
Citation(s): Duarte, Amanda, Roldan, Francisco, Tubau, Miquel ORCID: 0000-0003-1971-5797 <https://orcid.org/0000-0003-1971-5797>, Escur, Janna, Pascual, Santiago, Salvador, Amaia ORCID: 0000-0002-9908-1685 <https://orcid.org/0000-0002-9908-1685>, Mohedano, Eva, McGuinness, Kevin ORCID: 0000-0003-1336-6477 <https://orcid.org/0000-0003-1336-6477>, Torres, Jordi ORCID: 0000-0003-1963-7418 <https://orcid.org/0000-0003-1963-7418> and Giró-i-Nieto, Xavier ORCID: 0000-0002-9935-5332 <https://orcid.org/0000-0002-9935-5332> (2019) Speech-conditioned face generation using generative adversarial networks. In: 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 12 -17 May, 2019, Brighton, UK. (In Press)
File Format(s): application/pdf
Related Link(s): http://doras.dcu.ie/23188/1/Wav2Pix.pdf
First Indexed: 2019-05-17 06:06:02 Last Updated: 2020-12-19 06:16:53