FIELD: computer science.
SUBSTANCE: method includes obtaining a trained flow-based vocoder including reversible blocks and an untrained feed-forward vocoder including irreversible blocks that form a teacher-student network, performing a learning process on the teacher-student network, during which the server generates (i) a teacher-related waveform by a trained flow-based vocoder using the first spectrogram and the first input noise, (ii) a student-related waveform by an untrained feed-forward vocoder using the first spectrogram and the first input noise, and (iii) a loss value for a particular training iteration using the teacher-related waveform and the student-related waveform. The server then trains the untrained feed-forward vocoder to generate a waveform. The trained feed-forward vocoder is used instead of the trained flow-based vocoder to generate waveforms based on spectrograms and input noise.
EFFECT: improved efficiency of generating realistic audio representations of text.
17 cl, 7 dwg
Title | Year | Author | Number |
---|---|---|---|
METHOD FOR SPEECH SYNTHESIS WITH TRANSMISSION OF ACCURATE INTONATION OF THE CLONED SAMPLE | 2020 |
|
RU2754920C1 |
TRAINING OF DNN-STUDENT BY MEANS OF OUTPUT DISTRIBUTION | 2014 |
|
RU2666631C2 |
METHOD AND SERVER FOR DETERMINING TRAINING SET FOR MACHINE LEARNING ALGORITHM (MLA) TRAINING | 2020 |
|
RU2817726C2 |
AUDIO DATA GENERATOR AND METHODS OF GENERATING AUDIO SIGNAL AND TRAINING AUDIO DATA GENERATOR | 2021 |
|
RU2823016C1 |
AUDIO DATA GENERATOR AND METHODS OF GENERATING AUDIO SIGNAL AND TRAINING AUDIO DATA GENERATOR | 2021 |
|
RU2823015C1 |
METHODS AND ELECTRONIC DEVICES FOR PACKAGING REQUESTS INTENDED FOR PROCESSING BY PROCESSING UNIT | 2021 |
|
RU2810916C2 |
UNCONTROLLED VOICE RESTORATION USING UNCONDITIONED DIFFUSION MODEL WITHOUT TEACHER | 2023 |
|
RU2823017C1 |
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM IN TRANSLATION | 2020 |
|
RU2770569C2 |
METHOD AND SERVER FOR CONVERTING TEXT TO SPEECH | 2020 |
|
RU2775821C2 |
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION | 2016 |
|
RU2698153C1 |
Authors
Dates
2023-09-14—Published
2021-06-03—Filed