FIELD: information technology.
SUBSTANCE: verbal segments are extracted. Acoustic MFCC features of a vector are calculated. Each verbal segment is projected to the space EV of proper voices with a degree of 10 so that a set of Y vectors is obtained. Clustering centres C1 and C2 of the Y vectors are determined. Discriminative clustering is performed by calculation of parameters of planes H1, H2 and approximate determination of concentration areas of the Y vectors that are homogeneous as to speaker's information. Obtained data on the verbal segments are used for initialisation of VB diarisation based on a variation and Bayesian analysis. Marks of the segments as to the speakers during the whole pronouncing are obtained, on the basis of which correction of clustering centres C1 and C2 is performed; with that, operations of discriminative clustering, variation and Bayesian analysis and correction of clustering centres are performed subsequently at several iteration EV-VB stages. At each stage of iterations there performed is an analysis of complete segmentation as to the speakers, and at the absence of variations in segmentation on iteration it is stopped; after that, final segmentation representing the table correspondence between the verbal segments of an input signal and the speaker's index is obtained by Viterbi resegmentation.
EFFECT: improving accurate detection of a speaker for a dialogue in a telephone channel.
4 dwg, 1 tbl
Authors
Dates
2014-10-10—Published
2013-04-23—Filed