FIELD: physics.
SUBSTANCE: text representation of the document image is obtained in the process of extracting data from the fields to the document image. A graph is constructed to store attributes of the document text fragments and the links between them. A cascade classification is made to calculate the attributes of the document text fragments and the links between them. A set of hypotheses is formed about the text fragment affiliation in the fields on the document image. A combination of hypotheses is selected. And data extracting is done from the fields on the document image based on the selected combination of the hypotheses.
EFFECT: saving computing resources.
15 cl, 8 dwg
Authors
Dates
2017-03-21—Published
2015-09-07—Filed