FIELD: data processing.
SUBSTANCE: invention relates to means of recognizing mixed speech. First neural network is trained to recognize a speech signal pronounced by a speaker with a higher level of speech characteristics from a sample of mixed speech. Second neural network is trained to recognize a speech signal pronounced by a speaker with a lower level of speech characteristics from a sample of mixed speech. Mixed speech sample is decoded by a first neural network and a second neural network by optimizing the combined probability of observing said two speech signals, where the combined probability means the probability that a particular frame is a switching point of the speech characteristic. Third neural network is taught to predict switching of the speech characteristic. Mixed speech sample is decoded based on said prediction.
EFFECT: high accuracy of recognizing mixed speech.
15 cl, 5 tbl, 6 dwg
Title | Year | Author | Number |
---|---|---|---|
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION | 2016 |
|
RU2698153C1 |
DEVICE, METHOD, OR COMPUTER PROGRAM FOR GENERATING AN EXTENDED-BAND AUDIO SIGNAL USING A NEURAL NETWORK PROCESSOR | 2018 |
|
RU2745298C1 |
TRAINING OF DNN-STUDENT BY MEANS OF OUTPUT DISTRIBUTION | 2014 |
|
RU2666631C2 |
SYSTEM FOR VERIFYING THE SPEAKING PERSON IDENTITY | 1996 |
|
RU2161336C2 |
METHOD AND APPARATUS FOR DEFINING A DEEP FILTER | 2020 |
|
RU2788939C1 |
METHOD AND EQUIPMENT FOR RECOGNIZING EMOTIONS IN SPEECH | 2019 |
|
RU2720359C1 |
METHOD AND DEVICE FOR INCREASING SPEECH INTELLIGIBILITY USING SEVERAL SENSORS | 2004 |
|
RU2373584C2 |
METHODS AND SYSTEM FOR WAVEFORM-BASED ENCODING OF AUDIO SIGNALS USING GENERATOR MODEL | 2020 |
|
RU2823081C1 |
SYSTEM AND METHOD OF CONVERTING VOICE SIGNAL INTO TRANSCRIPT PRESENTATION WITH METADATA | 2014 |
|
RU2589851C2 |
METHOD FOR HYBRID GENERATIVE-DISCRIMINATIVE SEGMENTATION OF SPEAKERS IN AUDIO-FLOW | 2013 |
|
RU2530314C1 |
Authors
Dates
2019-04-29—Published
2015-03-19—Filed