FIELD: machine learning.
SUBSTANCE: invention relates to a method of recognizing the nature of text content. Method comprises steps of: generating an initial set of text data sources containing content of a predetermined subject, wherein each source is assigned at least one content nature mark and at least one content subject mark; automatically performing parsing of each source in the set of sources to identify the author of the source and identify links to third-party sources, wherein sources not included in available set of sources are considered as third-party sources, wherein links to third-party sources are the names of third-party sources and url-links to third-party sources; searching for said third-party sources by identified links; searching for third-party sources by identified authors; selecting sources from found third-party sources, subject of which is close to at least one of the content subjects of the initial set of sources; automatically assigning to selected sources corresponding content subject labels; forming an additional set of sources from the selected sources; each source from the additional set of sources is automatically assigned at least one content character label by comparing this source with sources from the initial set, having the same subject matter as this source; and generating a training set of sources by combining the initial set of sources and the marked additional set of sources.
EFFECT: high accuracy and speed of obtaining the end result.
4 cl
Title | Year | Author | Number |
---|---|---|---|
RETRIEVAL OF INFORMATION OBJECTS USING A COMBINATION OF CLASSIFIERS ANALYZING LOCAL AND NON-LOCAL SIGNS | 2018 |
|
RU2686000C1 |
METHOD AND SYSTEM FOR CREATING BRIEF SUMMARY OF DIGITAL CONTENT | 2016 |
|
RU2637998C1 |
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION | 2014 |
|
RU2665239C2 |
DISTRIBUTED LEARNING MACHINE LEARNING MODELS FOR PERSONALIZATION | 2018 |
|
RU2702980C1 |
METHOD AND SYSTEM FOR CHECKING MEDIA CONTENT | 2022 |
|
RU2815896C2 |
SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS | 2020 |
|
RU2758683C2 |
SYSTEM FOR AUTOMATIC DETERMINATION OF SUBJECT MATTER OF TEXT DOCUMENTS BASED ON EXPLICABLE ARTIFICIAL INTELLIGENCE METHODS | 2023 |
|
RU2823436C1 |
METHOD AND SYSTEM FOR GENERATING AN OBJECT CARD | 2018 |
|
RU2739554C1 |
SYSTEM FOR CREATING DOCUMENTS BASED ON TEXT ANALYSIS ON NATURAL LANGUAGE | 2016 |
|
RU2639655C1 |
SYSTEM AND METHOD FOR AUTOMATED ASSESSMENT OF INTENTIONS AND EMOTIONS OF USERS OF DIALOGUE SYSTEM | 2020 |
|
RU2762702C2 |
Authors
Dates
2024-10-04—Published
2023-06-30—Filed