METHOD OF CONSTRUCTING CORPUS BASED ON INTERNET FORUMS Russian patent published in 2015 - IPC G06N5/02 G06F17/21 G06F17/30 

Abstract RU 2565473 C2

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to systems and methods of creating corpuses for various research and other purposes. The method of constructing a corpus based on Internet forums for a computer system comprises constructing a document object model (DOM) in the form of a tree DOM data structure; selecting a group of single-type vertices in the DOM tree; removing optional design elements from pages; merging non-sheet vertices with the same names in the object model tree and combining sheet vertices with the same properties; estimating the vertices and filtering groups; constructing XPATH expressions and applying the obtained XPATH expressions to a set of files containing all documents from a selected forum.

EFFECT: high accuracy of separating user text from other content on web pages with automatic construction of a corpus.

10 cl, 3 dwg

Similar patents RU2565473C2

Title Year Author Number
OPTIMISING EXECUTION OF HD-DVD TIMING MARKUP 2007
  • Dehvis Dzheffri
  • Diguehro Dzhoehl
RU2460157C2
DEVICE AND METHOD FOR PROCESSING CONTENT OF WEB RESOURCE IN BROWSER 2014
  • Nikitin Konstantin Sergeevich
  • Chigrin Vyacheslav Olegovich
RU2595524C2
WEBPAGE BROWSING METHOD, WEBAPP FRAMEWORK, METHOD AND DEVICE FOR EXECUTING JAVASCRIPT AND MOBILE TERMINAL 2013
  • Liang Jie
  • Ma Miaokui
RU2604326C2
METHODS AND SYSTEMS FOR PROCESSING DOCUMENT OBJECT MODELS (DOM) TO PROCESS VIDEO CONTENT 2010
  • Chehbot Timoti Dzh.
  • Uinds Ehdvin D.
  • Ehtehs Gregori Dzh.
  • Li Gehng
  • Khehjosh Tomas I.
  • Moreno Sizar
RU2475832C1
METHOD AND SYSTEM FOR MODIFYING TEXT IN DOCUMENT 2015
  • Smuglyj Arsenij Ivanovich
RU2610585C2
PROGRAMMABILITY FOR XML DATA STORE FOR DOCUMENTS 2006
  • Dehvis Tristan A.
  • Talegkhani Ali
  • Dzhounz Brajan M.
  • Savitski Marsin
  • Littl Robert A.
  • Ali Al'Nur
RU2417420C2
METHOD OF ANALYSING TEXT DATA TONALITY 2014
  • Yang David Yevgenievich
  • Tyurin Anton Yevgenievich
  • Mikhaylov Maksim Borisovich
  • Danielyan Tatiana Vladimirovna
  • Lokotilova Olga Vladimirovna
RU2571373C2
SYSTEM AND METHOD FOR GENERATING CLASSIFIER FOR DETECTING PHISHING SITES USING DOM OBJECT HASHES 2023
  • Tushkanov Vladislav Nikolaevich
RU2811375C1
PROGRAMMING INTERFACE FOR COMPUTER PLATFORM 2004
  • Bogdan Dzheffri L.
  • Relaja Robert A.
RU2371758C2
METHOD FOR DETECTING PHISHING SITES AND SYSTEM THAT IMPLEMENTS IT 2023
  • Tushkanov Vladislav Nikolaevich
RU2813242C1

RU 2 565 473 C2

Authors

Kopylov Nikolaj Jur'Evich

Pronin Aleksandr Konstantinovich

Dates

2015-10-20Published

2013-11-01Filed