FIELD: computer equipment.
SUBSTANCE: invention, in general, relates to computer systems, or specifically to natural language processing systems and methods. In the method of automatic segmentation of a text document, segmentation is performed to mark out an unmarked target text to obtain a plurality of target candidate segments belonging to the types of segments from the plurality of types of segments. Attributes of the target text in the first target candidate segment are identified from the set of target candidate segments. Attributes of the target text in the first target candidate segment are analyzed using the first classifier of the segment type from the plurality of classifiers to determine the first target candidate segment as having the first type of the segment. And the first classifier of segment type was trained to define segments as corresponding to the first type of segments on the marked text. Text of the first target candidate segment is analyzed based on assigning the first target candidate segment to the first type of segments.
EFFECT: technical result is higher efficiency of information retrieval by reducing time of pre-processing of documents and higher accuracy of the information retrieved.
18 cl, 4 dwg
Title | Year | Author | Number |
---|---|---|---|
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION | 2014 |
|
RU2665239C2 |
RETRIEVAL OF INFORMATION OBJECTS USING A COMBINATION OF CLASSIFIERS ANALYZING LOCAL AND NON-LOCAL SIGNS | 2018 |
|
RU2686000C1 |
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION | 2020 |
|
RU2750852C1 |
USE OF DEPTH SEMANTIC ANALYSIS OF TEXTS ON NATURAL LANGUAGE FOR CREATION OF TRAINING SAMPLES IN METHODS OF MACHINE TRAINING | 2016 |
|
RU2636098C1 |
ALLOCATION OF TIME EXPRESSIONS FOR TEXTS IN NATURAL LANGUAGE | 2014 |
|
RU2595489C2 |
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT | 2021 |
|
RU2774653C1 |
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT | 2020 |
|
RU2760471C1 |
TRAINING NEURAL NETWORKS USING LOSS FUNCTIONS REFLECTING RELATIONSHIPS BETWEEN NEIGHBOURING TOKENS | 2018 |
|
RU2721190C1 |
EXTRACTING INFORMATION OBJECTS WITH THE HELP OF A CLASSIFIER COMBINATION | 2017 |
|
RU2679988C1 |
TRAINING CLASSIFIERS USED TO EXTRACT INFORMATION FROM NATURAL LANGUAGE TEXTS | 2018 |
|
RU2691855C1 |
Authors
Dates
2018-09-06—Published
2017-09-06—Filed