FIELD: information technologies.
SUBSTANCE: 2D representation of a document is used to identify a visual structure, which helps to recognise a document. The visual structure is exposed to grammatical analysis by association of multiple grammatical rules with multiple types of symbols identifier in the visual structure of the document. This makes it possible to recognise components of the document (for instance, columns, names of authors, headings, references, etc.), as a result of which structural components of the document may be accurately interpreted. At the same time the grammatical analysis is based on a function of grammatical value, which is produced by means of a machine training procedure. At the same time the grammatical analysis comprises representation of analysis in the form of an image and estimation of an image for execution of the grammatical value function with definition of optimal analysis. To simplify document recognition, it is possible to use procedures of grammatical analysis, where procedures of amplification and/or "quick recognition criteria", etc. are used.
EFFECT: improved accuracy of document detection.
19 cl, 10 dwg, 5 tbl
Authors
Dates
2011-06-20—Published
2006-06-30—Filed