FIELD: data processing.
SUBSTANCE: invention relates to formation of a text corpus containing realistic errors of optical character recognition (OCR), and training of language models using text corpuses. To this end, an example of method implementation includes creation of computer system initial set of images based on input text-containing text corpuses; computer application of one or more simulated defects on images of initial plurality of images to create augmented set of images; forming an output text corpus based on an augmented set of images and training a language model using the obtained text corpus for optical character recognition.
EFFECT: technical result consists in improvement of image recognition quality.
20 cl, 8 dwg
Title | Year | Author | Number |
---|---|---|---|
TRAINING NEURAL NETWORKS FOR IMAGE PROCESSING USING SYNTHETIC PHOTOREALISTIC CONTAINING IMAGE SIGNS | 2018 |
|
RU2709661C1 |
OPTICAL CHARACTER RECOGNITION BY MEANS OF COMBINATION OF NEURAL NETWORK MODELS | 2020 |
|
RU2768211C1 |
METHOD FOR PROCESSING IMAGES BY CONVOLUTIONAL NEURAL NETWORKS | 2020 |
|
RU2771442C1 |
EXTRACTION OF MULTIPLE DOCUMENTS FROM A SINGLE IMAGE | 2020 |
|
RU2764705C1 |
REPRODUCING AUGMENTATION OF IMAGE DATA | 2018 |
|
RU2716322C2 |
RECONSTRUCTION OF THE DOCUMENT FROM DOCUMENT IMAGE SERIES | 2017 |
|
RU2659745C1 |
DETECTING AND IDENTIFYING OBJECTS ON IMAGES | 2020 |
|
RU2726185C1 |
DETECTING TEXT FIELDS USING NEURAL NETWORKS | 2018 |
|
RU2699687C1 |
IDENTIFICATION OF FIELDS AND TABLES IN DOCUMENTS USING NEURAL NETWORKS USING GLOBAL DOCUMENT CONTEXT | 2019 |
|
RU2723293C1 |
DETECTING SECTIONS OF TABLES IN DOCUMENTS BY NEURAL NETWORKS USING GLOBAL DOCUMENT CONTEXT | 2019 |
|
RU2721189C1 |
Authors
Dates
2020-05-18—Published
2019-03-29—Filed