FIELD: optics.
SUBSTANCE: invention relates to optical character recognition. Proposed system includes instructions in machine code, when executed by a processor, control system of optical character recognition to process scanned image containing text of document by performing identification of character images containing text in scanned image of document. Identification is performed for each document page and symbol for each image on page. Method includes identification of a set of suitable reference data structures for image of symbol using forest solutions. Method uses a suitable standard data structure to determine appropriate set of grapheme and uses identified set of suitable grapheme to select character code that corresponds to image of characters. Method includes preparing processed document containing codes symbols, which correspond to images of characters from a scanned document image, and processed document in one or more memory devices and memory.
EFFECT: optimising optical character recognition owing to use of forest solutions.
20 cl, 66 dwg
Authors
Dates
2016-04-20—Published
2014-12-16—Filed