FIELD: technology for recognizing text information from graphic file.
SUBSTANCE: in accordance to method, set in advance is order of access to additional information, assigned also is estimate of quality for each type of additional information, different variants of division of image of selected rows on fragments are constructed, for each fragment of row linear division graph is built, images of graphic elements are recognized, using a classifier, and an estimate is assigned to each recognition variant, transition from variants of recognition of graphic elements to variants of alphabet symbols is performed, for each chain, connecting starting and ending vertexes, chains are built, appropriate for all variants of recognition of graphical elements and variants of transitions from recognized graphical elements to alphabet symbols, produced variants are ranked in order of decrease of recognition quality estimate, produced variants are processed with usage of information about position of uppercase and lowercase letters, if more than one variant of symbol is available based on results of recognition of graphic element, variants are processed with successive usage of additional information, and/or when necessary simultaneous usage of all types of additional information, quality estimate is assigned to each produced variant, variants of symbols with estimate below predetermined value are discarded, produced variants are sorted using pair-wise comparison, and additional correction of recognition of spaces, erroneously recognized at previous stages, is performed.
EFFECT: increased precision of recognition of text and increased interference resistance of text recognition.
9 cl, 2 dwg
Authors
Dates
2007-03-10—Published
2005-06-16—Filed