FIELD: image processing.
SUBSTANCE: method includes several stages. Text data is obtained and divided into semantic units, while the division is carried out in accordance with the pauses in pronunciation and the type of division. The obtained semantic units are converted into audio data. With the help of a trained artificial neural network (hereinafter – ANN), the converted audio data is divided into fragments. The obtained fragments are compared with the key shots of the video stream. Shots are generated using key shot interpolation and converted into a sequence of sketches. The obtained sequence of sketches and the corresponding fragments of audio data are processed with the help of a competitive ANN (CANN) and a sequence of photorealistic images is formed. The obtained photorealistic images are combined into a video stream and the corresponding audio data is combined into an audio stream. The synchronization of the obtained video stream and audio stream is checked.
EFFECT: increased accuracy of avatar generation based on text data.
17 cl, 2 dwg
Authors
Dates
2021-05-31—Published
2020-10-30—Filed