FIELD: computer technology.
SUBSTANCE: invention relates to the field of computer technology for processing text data. The expected result is achieved by defining a mapping scheme in which words are represented by word forms, while the same word form can correspond to different words; defining a database containing many database records; and many sets of word forms in the database; formation of a set of hypotheses, including: the first hypothesis projectively associating with the target record (i) the first set of words in the document and (ii) the corresponding first set of word forms; and the second hypothesis projectively associating with the target record (i) the second set of words in the document and (ii) the corresponding second set of word forms; exclusion of the second hypothesis based on the discrepancy between the second set of word forms and each of the many sets of word forms in the database; and determination of the first set of words in the document as the target record by confirming the first hypothesis.
EFFECT: increase in the accuracy of detecting text fields and the values of these fields in digital documents by searching using word forms.
20 cl, 10 dwg
Authors
Dates
2022-03-23—Published
2021-04-15—Filed