FIELD: information technology.
SUBSTANCE: method for automatic iterative clusterisation of electronic documents according to semantic similarity includes converting each electronic document into a corresponding multidimensional vector in multidimensional space, the number of dimensions of which is determined by terms contained in the electronic document; finding the measure of proximity of the obtained vector to each of the vectors already existing in the clusters, which combine semantically similar documents processed previously; supplementing the cluster for which the found proximity measure is minimal with the document to be processed; determining a new vector for the additional cluster; taking as the term of the additional cluster the name of the document in said cluster for which the proximity measure of its vector to the determined new vector is minimal. Thus, when new electronic documents are input, existing clusters are processed as separate documents and not as a set of documents.
EFFECT: simple and faster processing of processing electronic documents and search in a clustered set of documents which are relevant to a search request.
12 cl, 6 dwg
Authors
Dates
2015-07-10—Published
2014-02-14—Filed