SYSTEM AND METHOD FOR COLLECTING AND PROCESSING NEWS ON THE INTERNET Russian patent published in 2023 - IPC G06F16/951 G06F40/00 

Abstract RU 2795678 C1

FIELD: computer engineering.

SUBSTANCE: invention relates to increasing the accuracy of collecting and processing text information from a web page. It is achieved due to the analyser module for searching domain names on the Internet containing news sources, analysing the HTML code to identify news feeds, extracting a link to the text of the news source, transferring the identified links, their type and processing algorithm to the database; a scraping module for processing data using a web resource mark-up analysis algorithm; parsing module for receiving HTML code from the scraping module, extracting text from HTML code using two text data collection algorithms, each of which selects the HTML node with the largest ratio of characters characterizing the connected text of the news source to their total number, processing the results of extracting algorithms by a machine learning model for analysing the presence of characteristics of sources that are not news and detecting a semantically coherent text that characterizes a news source.

EFFECT: increasing the accuracy of collecting and processing text information from a web page.

5 cl, 8 dwg

Similar patents RU2795678C1

Title Year Author Number
SYSTEM AND METHOD FOR SELECTING RELEVANT PAGE ITEMS WITH IMPLICITLY SPECIFYING COORDINATES FOR IDENTIFYING AND VIEWING RELEVANT INFORMATION 2015
  • Tsyplyaev Maksim Viktorovich
  • Vinokurov Nikita Alekseevich
RU2708790C2
METHOD AND SYSTEM FOR COMPUTER PROCESSING OF ONE OR MORE QUOTES IN DIGITAL TEXTS FOR DETERMINATION OF THEIR AUTHOR 2018
  • Akulov Yaroslav Victorovich
RU2711123C2
METHOD AND SYSTEM FOR GENERATING AN OBJECT CARD 2018
  • Akulov Yaroslav Viktorovich
RU2739554C1
DEPTH REFERENCES FOR NATIVE APPLICATIONS 2015
  • Chang, Lawrence
  • Xu, Hui
RU2668726C2
METHOD OF DETERMINING PROFILE OF MOBILE DEVICE USER ON MOBILE DEVICE ITSELF AND DEMOGRAPHIC PROFILING SYSTEM 2016
  • Yoo Jaebong
  • Kryzhanovskiy Konstantin Alexandrovich
  • Podoynitsina Lyubov Vladimirovna
  • Romanenko Alexander Alexandrovich
  • Polubotko Dmitry Valerievich
  • Kazantsev Alexey Yurievich
  • Moiseenko Andrey Konstantinovich
  • Maslennikov Mstislav Vladimirovich
RU2647661C1
SYSTEM AND METHOD FOR GENERATING CLASSIFIER FOR DETECTING PHISHING SITES USING DOM OBJECT HASHES 2023
  • Tushkanov Vladislav Nikolaevich
RU2811375C1
DEEP LINKS FOR NATIVE APPLICATIONS 2015
  • Chang, Lawrence
  • Xu, Hui
RU2774319C2
SYSTEM AND METHOD FOR COLLECTING INFORMATION FOR DETECTING PHISHING 2016
  • Volkov Dmitrij Aleksandrovich
RU2671991C2
HYBRID AUTOMATIC SYSTEM FOR CONTROLLING USERS ACCESS TO INFORMATION RESOURCES IN PUBLIC COMPUTER NETWORKS 2018
  • Ashmanov Igor Stanislavovich
  • Ivanov Aleksej Petrovich
  • Otarbiev Eldar Otarbievich
  • Pashko Dmitrij Alekseevich
  • Tikhonov Maksim Viktorovich
RU2697925C1
METHOD FOR DETECTING PHISHING SITES AND SYSTEM THAT IMPLEMENTS IT 2023
  • Tushkanov Vladislav Nikolaevich
RU2813242C1

RU 2 795 678 C1

Authors

Shevtsov Mikhail Yurevich

Kozlov Andrej Mikhajlovich

Ivanov Aleksandr Dmitrievich

Zubitskij Pavel Sergeevich

Malyshev Ilya Aleksandrovich

Dates

2023-05-05Published

2022-04-29Filed