TEACHING LANGUAGE MODELS USING TEXT CORPUSES CONTAINING REALISTIC ERRORS OF OPTICAL CHARACTER RECOGNITION (OCR) Russian patent published in 2020 - IPC G06K9/82 G06N3/08 

Abstract RU 2721187 C1

FIELD: data processing.

SUBSTANCE: invention relates to formation of a text corpus containing realistic errors of optical character recognition (OCR), and training of language models using text corpuses. To this end, an example of method implementation includes creation of computer system initial set of images based on input text-containing text corpuses; computer application of one or more simulated defects on images of initial plurality of images to create augmented set of images; forming an output text corpus based on an augmented set of images and training a language model using the obtained text corpus for optical character recognition.

EFFECT: technical result consists in improvement of image recognition quality.

20 cl, 8 dwg

Similar patents RU2721187C1

Title Year Author Number
TRAINING NEURAL NETWORKS FOR IMAGE PROCESSING USING SYNTHETIC PHOTOREALISTIC CONTAINING IMAGE SIGNS 2018
  • Zagajnov Ivan Germanovich
  • Borin Pavel Valerevich
RU2709661C1
OPTICAL CHARACTER RECOGNITION BY MEANS OF COMBINATION OF NEURAL NETWORK MODELS 2020
  • Konstantin Anisimovich
  • Alexey Zhuravlev
RU2768211C1
METHOD FOR PROCESSING IMAGES BY CONVOLUTIONAL NEURAL NETWORKS 2020
  • Byrkov Igor Anatolevich
  • Vyzhletsov Valentin Valentinovich
  • Kozhanov Nikita Yurevich
  • Mishin Sergej Aleksandrovich
  • Okov Igr Nikolaevich
RU2771442C1
EXTRACTION OF MULTIPLE DOCUMENTS FROM A SINGLE IMAGE 2020
  • Ivan Zagaynov
  • Aleksandra Stepina
RU2764705C1
REPRODUCING AUGMENTATION OF IMAGE DATA 2018
  • Konstantin Zuev
  • Andrejs Sautins
RU2716322C2
RECONSTRUCTION OF THE DOCUMENT FROM DOCUMENT IMAGE SERIES 2017
  • Loginov Vasilij Vasilevich
  • Zagajnov Ivan Germanovich
  • Karatsapova Irina Aleksandrovna
RU2659745C1
DETECTING AND IDENTIFYING OBJECTS ON IMAGES 2020
  • Ivan Zagaynov
  • Andrew Zharkov
RU2726185C1
DETECTING TEXT FIELDS USING NEURAL NETWORKS 2018
  • Zuev, Konstantin Alekseevich
  • Senkevich, Oleg Evgenyevich
  • Golubev, Sergei Vladimirovich
RU2699687C1
IDENTIFICATION OF FIELDS AND TABLES IN DOCUMENTS USING NEURAL NETWORKS USING GLOBAL DOCUMENT CONTEXT 2019
  • Stanislav Semenov
RU2723293C1
DETECTING SECTIONS OF TABLES IN DOCUMENTS BY NEURAL NETWORKS USING GLOBAL DOCUMENT CONTEXT 2019
  • Stanislav Semenov
RU2721189C1

RU 2 721 187 C1

Authors

Ivan Germanovich Zagaynov

Dates

2020-05-18Published

2019-03-29Filed