METHOD FOR AUTOMATIC ITERATIVE CLUSTERISATION OF ELECTRONIC DOCUMENTS ACCORDING TO SEMANTIC SIMILARITY, METHOD FOR SEARCH IN PLURALITY OF DOCUMENTS CLUSTERED ACCORDING TO SEMANTIC SIMILARITY AND COMPUTER-READABLE MEDIA Russian patent published in 2015 - IPC G06F17/30 G06F17/20

Abstract RU 2556425 C1

FIELD: information technology.

SUBSTANCE: method for automatic iterative clusterisation of electronic documents according to semantic similarity includes converting each electronic document into a corresponding multidimensional vector in multidimensional space, the number of dimensions of which is determined by terms contained in the electronic document; finding the measure of proximity of the obtained vector to each of the vectors already existing in the clusters, which combine semantically similar documents processed previously; supplementing the cluster for which the found proximity measure is minimal with the document to be processed; determining a new vector for the additional cluster; taking as the term of the additional cluster the name of the document in said cluster for which the proximity measure of its vector to the determined new vector is minimal. Thus, when new electronic documents are input, existing clusters are processed as separate documents and not as a set of documents.

EFFECT: simple and faster processing of processing electronic documents and search in a clustered set of documents which are relevant to a search request.

12 cl, 6 dwg

Similar patents RU2556425C1

Title	Year	Author	Number
METHOD OF CONSTRUCTING SEMANTIC MODEL OF DOCUMENT	2011	Turdakov Denis Jur'Evich Nedumov Jaroslav Rostislavovich Sysoev Andrej Anatol'Evich	RU2487403C1
METHOD FOR SEPARATING TEXTS AND ILLUSTRATIONS IN IMAGES OF DOCUMENTS USING A DESCRIPTOR OF DOCUMENT SPECTRUM AND TWO-LEVEL CLUSTERING	2017	Anisimovskiy Valery Valerievich	RU2656708C1
METHOD AND SYSTEM OF SEMANTIC PROCESSING TEXT DOCUMENTS	2016	Mitelkov Dmitrij Vladimirovich Novikov Andrej Yurevich Satin Boris Borisovich	RU2630427C2
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT	2021	Vyshegorodtsev Kirill Evgenevich Davidov Dmitrij Georgievich Ryupichev Dmitrij Yurevich Balashov Aleksandr Viktorovich	RU2775351C1
METHOD FOR SEMANTIC PROCESSING OF NATURAL LANGUAGE USING GRAPHIC INTERMEDIARY LANGUAGE	2009	Mende Mikhaehl'	RU2509350C2
METHOD OF DATA TRANSFORMATION OF GEOINFORMATION SYSTEMS (GIS), SYSTEM FOR ITS IMPLEMENTATION AND METHOD OF SEARCH FOR THE DATA BASED ON THIS METHOD	2017	Sysoev Aleksandr Vadimovich	RU2669143C1
METHOD AND SYSTEM FOR CLUSTERING DOCUMENTS	2019	Shagraev Aleksey Galimovich	RU2757592C1
METHOD OF SEARCHING FOR ELECTRONIC DOCUMENTS SIMILAR ON SEMANTIC CONTENT, STORED ON DATA STORAGE DEVICES	2009	Borodashchenko Anton Jur'Evich Bochkov Sergej Maksimovich Vasinev Dmitrij Aleksandrovich Salbiev Artem Leonidovich	RU2420800C2
SYSTEM AND METHOD FOR SEMANTIC SEARCH	2013	Zuev Konstantin Alekseevich Daniehljan Tat'Jana Vladimirovna Rakhmatulina Ehl'Mira Monirovna	RU2563148C2
METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION	2021	Vyshegorodtsev Kirill Evgenevich Obolenskij Ivan Aleksandrovich Golovnya Maksim Sergeevich	RU2775358C1

RU 2 556 425 C1

Authors

Klintsov Viktor Petrovich

Seledkin Vjacheslav Alekseevich

Dates

2015-07-10—Published

2014-02-14—Filed