FIELD: information technology.
SUBSTANCE: during the training phase, harmonics of the fundamental tone, the noise component and the transitional component are picked up in the speech signal of the target and initial speakers in the frame analysis. Vocalness of the frame of the speech signal of the initial speaker is determined. If the frame of the speech signal is voiced, the frequency of its fundamental tone is determined. If the fundamental tone is not detected, the frame is transitional, if the frame is not voiced and not transitional, it is considered a silent interval of the speech signal. The transitional frame is formed using a linear predictor with excitation according to its codebook. During the conversion phase, if the frame of the speech signal of the initial speaker is voiced, the frequency of the fundamental tone of the speech signal and the time loop for its change are determined, and through discrete Fourier transformation, the frame is divided into frequency harmonics of the fundamental tone and into a noise component equal to residual noise from the difference between the frame of the initial speaker and the frame re-synthesised on harmonics of the fundamental tone. Components are converted to parameters of the target speaker and conversion of the frequency of the fundamental tone for the initial speaker is taken into account. The component of harmonics of the fundamental tone and the noise component of the target speaker are synthesised and then summed up with the synthesised transitional component and silent interval of the speech signal.
EFFECT: high degree of coincidence of the voice of the target speaker in the converted speech signal.
2 cl, 8 dwg
Authors
Dates
2011-08-20—Published
2010-05-14—Filed