METHOD FOR AUTOMATIC ITERATIVE CLUSTERISATION OF ELECTRONIC DOCUMENTS ACCORDING TO SEMANTIC SIMILARITY, METHOD FOR SEARCH IN PLURALITY OF DOCUMENTS CLUSTERED ACCORDING TO SEMANTIC SIMILARITY AND COMPUTER-READABLE MEDIA Russian patent published in 2015 - IPC G06F17/30 G06F17/20 

Abstract RU 2556425 C1

FIELD: information technology.

SUBSTANCE: method for automatic iterative clusterisation of electronic documents according to semantic similarity includes converting each electronic document into a corresponding multidimensional vector in multidimensional space, the number of dimensions of which is determined by terms contained in the electronic document; finding the measure of proximity of the obtained vector to each of the vectors already existing in the clusters, which combine semantically similar documents processed previously; supplementing the cluster for which the found proximity measure is minimal with the document to be processed; determining a new vector for the additional cluster; taking as the term of the additional cluster the name of the document in said cluster for which the proximity measure of its vector to the determined new vector is minimal. Thus, when new electronic documents are input, existing clusters are processed as separate documents and not as a set of documents.

EFFECT: simple and faster processing of processing electronic documents and search in a clustered set of documents which are relevant to a search request.

12 cl, 6 dwg

Similar patents RU2556425C1

Title Year Author Number
METHOD OF CONSTRUCTING SEMANTIC MODEL OF DOCUMENT 2011
  • Turdakov Denis Jur'Evich
  • Nedumov Jaroslav Rostislavovich
  • Sysoev Andrej Anatol'Evich
RU2487403C1
METHOD FOR SEPARATING TEXTS AND ILLUSTRATIONS IN IMAGES OF DOCUMENTS USING A DESCRIPTOR OF DOCUMENT SPECTRUM AND TWO-LEVEL CLUSTERING 2017
  • Anisimovskiy Valery Valerievich
RU2656708C1
METHOD AND SYSTEM OF SEMANTIC PROCESSING TEXT DOCUMENTS 2016
  • Mitelkov Dmitrij Vladimirovich
  • Novikov Andrej Yurevich
  • Satin Boris Borisovich
RU2630427C2
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Davidov Dmitrij Georgievich
  • Ryupichev Dmitrij Yurevich
  • Balashov Aleksandr Viktorovich
RU2775351C1
METHOD FOR SEMANTIC PROCESSING OF NATURAL LANGUAGE USING GRAPHIC INTERMEDIARY LANGUAGE 2009
  • Mende Mikhaehl'
RU2509350C2
METHOD OF DATA TRANSFORMATION OF GEOINFORMATION SYSTEMS (GIS), SYSTEM FOR ITS IMPLEMENTATION AND METHOD OF SEARCH FOR THE DATA BASED ON THIS METHOD 2017
  • Sysoev Aleksandr Vadimovich
RU2669143C1
METHOD AND SYSTEM FOR CLUSTERING DOCUMENTS 2019
  • Shagraev Aleksey Galimovich
RU2757592C1
METHOD OF SEARCHING FOR ELECTRONIC DOCUMENTS SIMILAR ON SEMANTIC CONTENT, STORED ON DATA STORAGE DEVICES 2009
  • Borodashchenko Anton Jur'Evich
  • Bochkov Sergej Maksimovich
  • Vasinev Dmitrij Aleksandrovich
  • Salbiev Artem Leonidovich
RU2420800C2
SYSTEM AND METHOD FOR SEMANTIC SEARCH 2013
  • Zuev Konstantin Alekseevich
  • Daniehljan Tat'Jana Vladimirovna
  • Rakhmatulina Ehl'Mira Monirovna
RU2563148C2
METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Obolenskij Ivan Aleksandrovich
  • Golovnya Maksim Sergeevich
RU2775358C1

RU 2 556 425 C1

Authors

Klintsov Viktor Petrovich

Seledkin Vjacheslav Alekseevich

Dates

2015-07-10Published

2014-02-14Filed