FIELD: data processing.
SUBSTANCE: invention relates to a method, computer-readable data medium and a system for creating a corpus of comparable documents. Method involves obtaining, by a computing device, an initial set of documents containing text, performance, by computing device, semantic-syntactic analysis of text to construct language-independent semantic structures of sentences of text of said documents, calculating values of a universal measure of similarity for groups of documents by comparing constructed, language-independent semantic structures for texts of said documents, detecting, by computing device, groups of similar documents based on calculated values of universal measure of similarity of groups of documents, forming, by computing device, a corpus of comparable documents based on detected similar documents.
EFFECT: technical result consists in possibility of automatic generation of a corpus of comparable documents.
15 cl, 15 dwg
Title | Year | Author | Number |
---|---|---|---|
EXTRACTION OF ENTITIES FROM TEXTS IN NATURAL LANGUAGE | 2015 |
|
RU2626555C2 |
METHOD AND SYSTEM FOR MACHINE EXTRACTION AND INTERPRETATION OF TEXT INFORMATION | 2015 |
|
RU2592396C1 |
COMPREHENSIVE AUTOMATIC PROCESSING OF TEXT INFORMATION | 2014 |
|
RU2662699C2 |
EXPANDING OF INFORMATION SEARCH POSSIBILITY | 2015 |
|
RU2618375C2 |
SYSTEM AND METHOD FOR SEMANTIC SEARCH | 2013 |
|
RU2563148C2 |
METHOD OF EXTRACTING FACTS FROM TEXTS ON NATURAL LANGUAGE | 2016 |
|
RU2637992C1 |
SENTIMENT ANALYSIS AT THE LEVEL OF ASPECTS USING METHODS OF MACHINE LEARNING | 2016 |
|
RU2657173C2 |
SENTIMENT ANALYSIS AT LEVEL OF ASPECTS AND CREATION OF REPORTS USING MACHINE LEARNING METHODS | 2016 |
|
RU2635257C1 |
EXTRACTION OF INFORMATION FROM SANITARY BLOCKS OF DOCUMENTS USING MICROMODELS ON BASIS OF ONTOLOGY | 2017 |
|
RU2662688C1 |
SYSTEM AND METHOD FOR AUTOMATIC CREATION OF TEMPLATES | 2018 |
|
RU2697647C1 |
Authors
Dates
2017-01-11—Published
2014-03-31—Filed