FIELD: computer systems.
SUBSTANCE: present invention relates primarily to computer systems and, in particular, to systems and methods for identifying writing systems used in documents. The technical result consists in increasing the efficiency of optical character recognition by applying a neural network to image fragments, reducing each image fragment to the size of the input layer of the network, in order to reduce the requirements for computing resources. The technical result is achieved due to the following. The method includes: obtaining an image of a document; splitting the image into fragments; generating probability vectors by the means of a neural network containing a plurality of numerical elements, and each numerical element reflects the probability of an image fragment containing text associated with the corresponding writing system; calculating an aggregated probability vector, each numeric element of the aggregated probability vector reflecting a probability of an image containing text associated with a writing system; and given the determination that the maximum numerical element of the aggregated probability vector exceeds a certain threshold value, it is concluded that the document image contains one or more characters associated with the corresponding writing system.
EFFECT: increase in the efficiency of optical character recognition by applying a neural network to image fragments.
20 cl, 5 dwg
Authors
Dates
2023-03-23—Published
2021-11-23—Filed