FIELD: information technology.
SUBSTANCE: checking method of web pages for content in them of target content, which involves quick analysis of web pages by means of an analysis of an initial code of a web page for presence in it of basic features demonstrating availability on web pages of a target content, and formation of a key of the detected features; a deep analysis of web pages associated with the key for availability in them of target content by execution of a program code, loading of objects and connection to media servers, loading and analysis of technical and meta information from flows, depending on presence in it or absence from it of target content; with that, for each key and feature there determined is effectiveness index based on statistical information on number and status of checked web pages. Obtained effectiveness indexes are used at stages of quick and deep analysis, thus allowing to apply a resource-intensive deep analysis only to web pages having high probability of availability of target content.
EFFECT: improving efficient detection of target content contained in checked web pages.
52 cl, 9 ex, 4 dwg
Authors
Dates
2014-10-10—Published
2013-07-24—Filed