FIELD: data processing.
SUBSTANCE: invention relates to performance of the OCR series of images containing text characters. Optical recognition of a series of images is performed to create sequences of characters and corresponding quadrangles of sequences of characters. Median string is defined. Transformation of the quadrangles of sequences of characters are calculated in a common coordinate system. Distance between the converted quadrangles of sequences of characters is defined. Median quadrangle of a sequence of characters is defined. Using the median quad of a sequence of characters, the resulting recognized text representing at least part of the original document is displayed.
EFFECT: technical result is higher accuracy of median string geometry.
20 cl, 15 dwg
Authors
Dates
2018-11-21—Published
2017-12-22—Filed