FIELD: physics.
SUBSTANCE: invention relates to computer engineering, and in particular to a method and a system for extracting named entities from text information based on a two-level classification of named entities. Computer-implemented method of extracting named entities from text information is performed using at least one processor and comprises steps of receiving input text data and processing them, performing vectorization of the obtained tokens, determining, using the first classifier, the probability vector of the corresponding token, obtained at the previous stage, to a given class, using a second classifier, which is a multilayer perceptron with a sigmoid activation function, the corresponding token membership probability vector, merging consecutive tokens with the same class into at least one named entity, determining the belonging of at least one named entity to at least one subtype of the class, extracting at least one named entity with the defined at least one subtype of the class.
EFFECT: high accuracy of recognizing named entities.
7 cl, 4 dwg
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION | 2020 |
|
RU2750852C1 |
SYSTEM AND METHOD FOR AUTOMATED ASSESSMENT OF INTENTIONS AND EMOTIONS OF USERS OF DIALOGUE SYSTEM | 2020 |
|
RU2762702C2 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR RETRIEVING NAMED ENTITIES | 2020 |
|
RU2760637C1 |
METHOD AND SYSTEM FOR PARAPHRASING TEXT | 2023 |
|
RU2814808C1 |
SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS | 2020 |
|
RU2758683C2 |
Authors
Dates
2024-07-30—Published
2021-08-03—Filed