TEXT SEGMENTATION Russian patent published in 2018 - IPC G06F17/27 

Abstract RU 2666277 C1

FIELD: computer equipment.

SUBSTANCE: invention, in general, relates to computer systems, or specifically to natural language processing systems and methods. In the method of automatic segmentation of a text document, segmentation is performed to mark out an unmarked target text to obtain a plurality of target candidate segments belonging to the types of segments from the plurality of types of segments. Attributes of the target text in the first target candidate segment are identified from the set of target candidate segments. Attributes of the target text in the first target candidate segment are analyzed using the first classifier of the segment type from the plurality of classifiers to determine the first target candidate segment as having the first type of the segment. And the first classifier of segment type was trained to define segments as corresponding to the first type of segments on the marked text. Text of the first target candidate segment is analyzed based on assigning the first target candidate segment to the first type of segments.

EFFECT: technical result is higher efficiency of information retrieval by reducing time of pre-processing of documents and higher accuracy of the information retrieved.

18 cl, 4 dwg

Similar patents RU2666277C1

Title Year Author Number
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION 2014
  • Nekhaj Ilya Vladimirovich
RU2665239C2
RETRIEVAL OF INFORMATION OBJECTS USING A COMBINATION OF CLASSIFIERS ANALYZING LOCAL AND NON-LOCAL SIGNS 2018
  • Indenbom Evgenij Mikhajlovich
RU2686000C1
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION 2020
  • Fedosin Sergei Alekseevich
  • Plotnikova Natalia Pavlovna
  • Martynov Vladislav Aleksandrovich
  • Ryskin Konstantin Eduardovich
  • Kuznetsov Dmitrii Aleksandrovich
  • Deniskin Aleksandr Vladimirovich
  • Vechkanova Iuliia Sergeevna
  • Fediushkin Nikolai Alekseevich
  • Tsilikov Nikita Sergeevich
RU2750852C1
USE OF DEPTH SEMANTIC ANALYSIS OF TEXTS ON NATURAL LANGUAGE FOR CREATION OF TRAINING SAMPLES IN METHODS OF MACHINE TRAINING 2016
  • Anisimovich Konstantin Vladimirovich
  • Selegej Vladimir Pavlovich
  • Garashchuk Ruslan Vladimirovich
RU2636098C1
ALLOCATION OF TIME EXPRESSIONS FOR TEXTS IN NATURAL LANGUAGE 2014
  • Romanenko Aleksandr Aleksandrovich
RU2595489C2
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT 2021
  • Stanislav Semenov
RU2774653C1
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT 2020
  • Semenov Stanislav Vladimirovich
  • Lanin Mikhail Olegovich
RU2760471C1
TRAINING NEURAL NETWORKS USING LOSS FUNCTIONS REFLECTING RELATIONSHIPS BETWEEN NEIGHBOURING TOKENS 2018
  • Eugene Indenbom
  • Daniil Anastasiev
RU2721190C1
EXTRACTING INFORMATION OBJECTS WITH THE HELP OF A CLASSIFIER COMBINATION 2017
  • Matskevich Stepan Evgenevich
  • Starostin Anatolij Sergeevich
  • Sukhodolov Dmitrij Andreevich
RU2679988C1
TRAINING CLASSIFIERS USED TO EXTRACT INFORMATION FROM NATURAL LANGUAGE TEXTS 2018
  • Matskevich Stepan Evgenevich
  • Bulgakov Ilya Aleksandrovich
RU2691855C1

RU 2 666 277 C1

Authors

Indenbom Evgenij Mikhajlovich

Kolotienko Sergej Sergeevich

Dates

2018-09-06Published

2017-09-06Filed