CONSTRUCTING CORPUS OF COMPARABLE DOCUMENTS BASED ON UNIVERSAL MEASURE OF SIMILARITY Russian patent published in 2017 - IPC G06F17/27 

Abstract RU 2607975 C2

FIELD: data processing.

SUBSTANCE: invention relates to a method, computer-readable data medium and a system for creating a corpus of comparable documents. Method involves obtaining, by a computing device, an initial set of documents containing text, performance, by computing device, semantic-syntactic analysis of text to construct language-independent semantic structures of sentences of text of said documents, calculating values of a universal measure of similarity for groups of documents by comparing constructed, language-independent semantic structures for texts of said documents, detecting, by computing device, groups of similar documents based on calculated values of universal measure of similarity of groups of documents, forming, by computing device, a corpus of comparable documents based on detected similar documents.

EFFECT: technical result consists in possibility of automatic generation of a corpus of comparable documents.

15 cl, 15 dwg

Similar patents RU2607975C2

Title Year Author Number
EXTRACTION OF ENTITIES FROM TEXTS IN NATURAL LANGUAGE 2015
  • Starostin Anatolij Sergeevich
  • Danielyan Tatyana Vladimirovna
  • Smurov Ivan Mikhajlovich
RU2626555C2
METHOD AND SYSTEM FOR MACHINE EXTRACTION AND INTERPRETATION OF TEXT INFORMATION 2015
  • Starostin Anatoly Sergeevich
  • Smurov Ivan Mikhailovich
  • Stepanova Maria Evgenyevna
RU2592396C1
COMPREHENSIVE AUTOMATIC PROCESSING OF TEXT INFORMATION 2014
  • Danielyan Tatyana Vladimirovna
  • Starostin Anatolij Sergeevich
  • Zuev Konstantin Alekseevich
  • Anisimovich Konstantin Vladimirovich
  • Selegej Vladimir Pavlovich
RU2662699C2
EXPANDING OF INFORMATION SEARCH POSSIBILITY 2015
  • Danielyan Tatyana Vladimirovna
  • Indenbom Evgenij Mikhajlovich
RU2618375C2
SYSTEM AND METHOD FOR SEMANTIC SEARCH 2013
  • Zuev Konstantin Alekseevich
  • Daniehljan Tat'Jana Vladimirovna
  • Rakhmatulina Ehl'Mira Monirovna
RU2563148C2
METHOD OF EXTRACTING FACTS FROM TEXTS ON NATURAL LANGUAGE 2016
  • Starostin Anatolij Sergeevich
  • Smurov Ivan Mikhajlovich
  • Dzhumaev Stanislav Sergeevich
RU2637992C1
SENTIMENT ANALYSIS AT THE LEVEL OF ASPECTS USING METHODS OF MACHINE LEARNING 2016
  • Matskevich Stepan Evgenevich
  • Kuznetsova Ekaterina Sergeevna
  • Gusev Ilya Olegovich
RU2657173C2
SENTIMENT ANALYSIS AT LEVEL OF ASPECTS AND CREATION OF REPORTS USING MACHINE LEARNING METHODS 2016
  • Mikhajlov Maksim Borisovich
  • Pasechnikov Konstantin Alekseevich
RU2635257C1
EXTRACTION OF INFORMATION FROM SANITARY BLOCKS OF DOCUMENTS USING MICROMODELS ON BASIS OF ONTOLOGY 2017
  • Danielyan Tatyana Vladimirovna
  • Mikhajlov Maksim Borisovich
RU2662688C1
SYSTEM AND METHOD FOR AUTOMATIC CREATION OF TEMPLATES 2018
  • Anisimovich Konstantin Vladimirovich
  • Garashchuk Ruslan Vladimirovich
  • Matskevich Stepan Evgenevich
RU2697647C1

RU 2 607 975 C2

Authors

Bogdanova Daria Nikolaevna

Dates

2017-01-11Published

2014-03-31Filed