METHOD OF RECOGNIZING NATURE OF TEXT CONTENT Russian patent published in 2025 - IPC G06N3/00 

Abstract RU 2838603 C1

FIELD: data processing.

SUBSTANCE: invention relates to machine learning and more specifically to a method of recognizing the nature of text content. Method comprises steps of: generating an initial set of text data sources containing content of a predetermined subject matter, wherein each source is assigned at least one content nature label and at least one content subject label; automatically performing parsing of each source in a set of sources to identify the author of the source, identify links to third-party sources and identifying the location in which the source author is located and/or in which the source is published, wherein sources not included in available set of sources are considered as third-party sources, wherein links to third-party sources are the names of third-party sources and URL-links to third-party sources; searching for said third-party sources by identified links; third-party sources are searched by the identified authors, wherein the third-party sources are searched based on the identified locations; selecting, from the found third-party sources, sources whose subject matter is close to at least one of the content subjects of the initial set of sources; automatically assigning to selected sources corresponding content subject labels; forming an additional set of sources from the selected sources; each source from the additional set of sources is automatically assigned at least one content character label by comparing said source with sources from the source set having the same subject matter as the given source; generating a training set of sources by combining an initial set of sources and a marked additional set of sources; and performing machine training of the content nature recognition model using the training set of sources.

EFFECT: high accuracy and speed of obtaining a result.

4 cl

Similar patents RU2838603C1

Title Year Author Number
METHOD OF RECOGNIZING NATURE OF TEXT CONTENT 2023
  • Nikanov Ivan Aleksandrovich
  • Sevastianov Ruslan Sergeevich
  • Merkulova Ekaterina Vladimirovna
RU2827987C1
DISTRIBUTED LEARNING MACHINE LEARNING MODELS FOR PERSONALIZATION 2018
  • Kudinov Mikhail Sergeevich
  • Piontkovskaya Irina Igorevna
  • Nevidomskii Aleksei Yurievich
  • Popov Vadim Sergeevich
  • Vytovtov Petr Konstantinovich
  • Polubotko Dmitry Valerievich
  • Malyugina Olga Valerievna
RU2702980C1
METHOD AND SYSTEM FOR CHECKING MEDIA CONTENT 2022
  • Gorb Roman Viktorovich
  • Yudin Sergej Mikhajlovich
  • Zobnin Aleksej Igorevich
  • Oreshin Pavel Evgenevich
RU2815896C2
RETRIEVAL OF INFORMATION OBJECTS USING A COMBINATION OF CLASSIFIERS ANALYZING LOCAL AND NON-LOCAL SIGNS 2018
  • Indenbom Evgenij Mikhajlovich
RU2686000C1
METHOD AND SYSTEM FOR CREATING BRIEF SUMMARY OF DIGITAL CONTENT 2016
  • Sadovskij Aleksandr Anatolevich
RU2637998C1
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION 2014
  • Nekhaj Ilya Vladimirovich
RU2665239C2
SYSTEM FOR IDENTIFYING REPHRASING USING MACHINE TRANSLATION TECHNOLOGY 2004
  • Kvirk Kristofer B.
  • Brokett Kristofer Dzh.
  • Dolan Uill'Jam B.
RU2368946C2
SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS 2020
  • Shavrina Tatyana Olegovna
RU2758683C2
METHODS AND SERVERS FOR TRAINING MODEL TO DETECT SPEAKER CHANGE 2024
  • Gritskevich Evgenii Marianovich
RU2841235C1
MULTISTAGE TRAINING OF MACHINE LEARNING MODELS FOR RANKING SEARCH RESULTS 2021
  • Bojmel Aleksandr Alekseevich
  • Soboleva Darya Mikhajlovna
RU2824338C2

RU 2 838 603 C1

Authors

Nikanov Ivan Aleksandrovich

Sevastianov Ruslan Sergeevich

Merkulova Ekaterina Vladimirovna

Dates

2025-04-21Published

2024-05-15Filed