FIELD: information technology.
SUBSTANCE: present invention relates to determining the genre of text, in particular to training a neural network for determining the genre and subgenre of text, including a large volume and complex semantic structure. According to the proposed method of training a neural network to determine the genre and subgenre of the text at the first stage: providing the availability of texts from the first group relating to one genre and containing a through named entity, and a dictionary containing said named entity and words falling into a predetermined step before and after the end-to-end named entity, training the neural network using the text from the first group, during training, the neural network selects the named entity and words and/or context structures falling into the given step before and after the named entity, they are placed in a list and the list is compared with said dictionary, based on which the neural network outputs the matching result to determine the genre of the text. At the second stage: providing the availability of texts from the second group, relating to the same genre and containing different named entities, training the neural network, trained at the first stage, using the text from the second group, repeating said operations of the first stage, starting with the named entity selection. At the third stage: after training the neural network at least two genres, providing the presence of texts from a third group relating to said trained genres and containing different named entities, and a combined dictionary obtained from augmented dictionaries for said trained genres, and training the neural network using the text from the third group, repeating said operations of the first step, starting with the named entity selection, wherein the merged dictionary is used for the comparison operation, and at the output, the neural network outputs a comparison result to determine the genre and subgenre of the text.
EFFECT: proposed method reduces the total amount of training data and time for training a neural network for the task of determining genre and subgenre belonging of large text corpuses with provision of high accuracy of results.
5 cl
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR CLASSIFYING AND FILTERING PROHIBITED CONTENT IN A NETWORK | 2020 |
|
RU2738335C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION | 2020 |
|
RU2750852C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION | 2014 |
|
RU2665239C2 |
SENTIMENT ANALYSIS AT LEVEL OF ASPECTS AND CREATION OF REPORTS USING MACHINE LEARNING METHODS | 2016 |
|
RU2635257C1 |
RETRIEVAL OF INFORMATION OBJECTS USING A COMBINATION OF CLASSIFIERS ANALYZING LOCAL AND NON-LOCAL SIGNS | 2018 |
|
RU2686000C1 |
METHOD OF EXTRACTING FACTS FROM TEXTS ON NATURAL LANGUAGE | 2016 |
|
RU2637992C1 |
Authors
Dates
2024-12-09—Published
2023-11-12—Filed