A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

doi:10.48550/arXiv.1908.07262

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.

Publication:

arXiv e-prints

Pub Date:

August 2019

DOI:

10.48550/arXiv.1908.07262

arXiv:

arXiv:1908.07262

Bibcode:

2019arXiv190807262W

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

Accepted to ISMAR 2019

NASA/ADS

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Abstract