FIELD: computer technology.
SUBSTANCE: group of inventions relates to computer systems intended for document analysis, more specifically to technologies of building and optimization of codebooks for detection of fields on a document. A method for optimization of a codebook is proposed. According to the method, the first set of document images is received by means of a data processing device. Next, a set of key areas is extracted from each document image of the first set of document images. Local descriptors are calculated for each key area of a number of extracted key area. In addition, local descriptors are clustered in such a way that each center of a local descriptor cluster corresponds to a corresponding visual word, and a codebook containing a set of visual words is built.
EFFECT: increase in the accuracy of information extraction from images due to the use of optimized codebooks.
20 cl, 12 dwg
Authors
Dates
2022-12-29—Published
2021-07-21—Filed