METHOD AND DEVICE FOR EXTRACTING WEB-PAGES SUBJECT-MATTER Russian patent published in 2020 - IPC G06F16/36 G06F16/951 

Abstract RU 2729227 C2

FIELD: information technology.

SUBSTANCE: invention relates to means of retrieving web-pages subject-matter. Obtaining possible web pages and pre-built machine learning model, wherein each possible web page comprises a plurality of pre-selected possible thematic sentences, wherein each possible thematic sentence comprises several verbal segments. Values of verbal characteristics are determined, which indicate levels of importance of verbal segments in each possible web page, respectively, and inputting said verbal characteristics values into machine learning model to obtain importance value for each verbal segment. For each possible web page determining the value of partial order for each possible thematic proposal in accordance with values of importance of verbal segments contained in a possible thematic proposal. For each possible web page, selecting one of a plurality of possible subject proposals associated with the partial order value, exceeding a predetermined threshold value as a target thematic sentence of a possible web page.

EFFECT: technical result consists in improvement of accuracy of subject proposals extracted from web pages.

20 cl, 6 dwg

Similar patents RU2729227C2

Title Year Author Number
CONSTRUCTION AND APPLICATION OF WEB-CATALOGUES FOR FOCUSED SEARCH 2005
  • Brill Ehrik D.
  • Chen Khehrr
  • Chandrasekar Raman
  • Korston Sajmon Kh.
RU2382400C2
METHOD AND SYSTEM FOR CREATING ANNOTATION VECTORS FOR DOCUMENT 2017
  • Gusakov Aleksey Yurievich
  • Drozdovsky Andrey Dmitrievich
  • Duzhik Valery Ivanovich
  • Kalinin Pavel Vladimirovich
  • Naydin Oleg Pavlovich
  • Safronov Aleksandr Valerievich
RU2720074C2
METHOD OF PROCESSING TARGET MESSAGE, METHOD OF PROCESSING NEW TARGET MESSAGE AND SERVER (VERSIONS) 2014
  • Zelenkov Sergej Yurevich
RU2589856C2
COLLECTING DATA ON USER BEHAVIOUR DURING WEB SEARCH TO INCREASE WEB SEARCH RELEVANCE 2007
  • Agikhtejn Evgenij E.
  • Brill Ehrik D.
  • Djumeh Sjuzan T.
  • Rehgno Robert Dzh.
RU2435212C2
METHOD OF DETERMINING PROFILE OF MOBILE DEVICE USER ON MOBILE DEVICE ITSELF AND DEMOGRAPHIC PROFILING SYSTEM 2016
  • Yoo Jaebong
  • Kryzhanovskiy Konstantin Alexandrovich
  • Podoynitsina Lyubov Vladimirovna
  • Romanenko Alexander Alexandrovich
  • Polubotko Dmitry Valerievich
  • Kazantsev Alexey Yurievich
  • Moiseenko Andrey Konstantinovich
  • Maslennikov Mstislav Vladimirovich
RU2647661C1
METHOD FOR DETERMINING SEQUENCE OF WEB BROWSING AND SERVER USED 2014
  • Leforte Damen Rejmon Zhan-Fransua
  • Ostroumova Lyudmila Aleksandrovna
  • Samosvat Egor Aleksandrovich
  • Serdyukov Pavel Viktorovich
  • Bogatyj Ivan Semeonovich
  • Chelnokov Arsenij Andreevich
RU2634218C2
METHOD AND SYSTEM OF SEARCH QUERY PROCESSING 2015
  • Vorobev Aleksandr Leonidovich
  • Serdyukov Pavel Viktorovich
  • Leforte Damen Rejmon Zhan-Fransua
  • Gusev Gleb Gennadevich
RU2640639C2
CHECKING METHOD OF WEB PAGES FOR CONTENT IN THEM OF TARGET AUDIO AND/OR VIDEO (AV) CONTENT OF REAL TIME 2013
  • Orel Denis Olegovich
  • Fomichev Aleksej Nikolaevich
RU2530671C1
METHOD OF SELECTING EFFECTIVE VERSIONS IN SEARCH AND RECOMMENDATION SYSTEMS (VERSIONS) 2013
  • Aleskerov Fuad Tagi Ogly
  • Mitichkin Evgenij Olegovich
  • Chistjakov Vjacheslav Vasil'Evich
  • Shvydun Sergej Vladimirovich
  • Jakuba Vjacheslav Ivanovich
RU2543315C2
SYSTEM AND METHOD FOR PROVIDING PREFERRED LANGUAGE FOR SORTING SEARCH RESULTS 2004
  • Lehmping Dzhon
  • Gomes Ben
  • Makgrat Mizuki
  • Singkhal Amit
RU2319202C2

RU 2 729 227 C2

Authors

Li Chenyao

Tszen Khunlej

Dates

2020-08-05Published

2016-11-18Filed