METHOD FOR SEPARATING TEXTS AND ILLUSTRATIONS IN IMAGES OF DOCUMENTS USING A DESCRIPTOR OF DOCUMENT SPECTRUM AND TWO-LEVEL CLUSTERING Russian patent published in 2018 - IPC G06K9/46 G06T7/11 G06T7/187 G06T3/40 

Abstract RU 2656708 C1

FIELD: image processing means.

SUBSTANCE: invention relates to analysis and processing of document images. Method for separating texts and illustrations in images of document pages comprises the steps of: receiving images of document pages; segmenting images of document pages into areas of interest; extracting a feature vector for each area of interest; and classifying each of the extracted feature vectors into one of two classes: text or illustration; wherein the extraction of the feature vector comprises sub-steps of: changing a size of the area of interest while remaining the ratio of its sides; extracting connectivity components from the area of interest of a changed size and calculating their centroids; determining the nearest neighbors for each centroid; constructing a two-dimensional histogram of normalized distances and angles for all pairs consisting of a centroid and each of its five nearest neighboring centroids; and transforming the two-dimensional histogram into a feature vector.

EFFECT: increased accuracy of separating texts and illustrations in images of documents and minimized errors of said separation.

16 cl, 21 dwg, 5 tbl

Similar patents RU2656708C1

Title Year Author Number
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION 2018
  • Nikita Orlov
  • Konstantin Anisimovich
RU2701995C2
GENERATION OF MARKING OF DOCUMENT IMAGES FOR TRAINING SAMPLE 2017
  • Zagajnov Ivan Germanovich
  • Borin Pavel Valerevich
RU2668717C1
STRUCTURE OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS 2021
  • Vasily Loginov
  • Ivan Zagaynov
  • Stanislav Semenov
RU2787138C1
SYSTEM AND METHOD FOR REAL-TIME DATA PROCESSING AND OBJECT RECOGNITION 2022
  • Veryutin Maksim Viktorovich
  • Ivanov Yurij Viktorovich
RU2802280C1
METHOD AND DEVICE FOR CLASSIFICATION OF IMAGES OF PRINTED COPIES OF DOCUMENTS AND SORTING SYSTEM OF PRINTED COPIES OF DOCUMENTS 2016
  • Zavalishin Sergej Stanislavovich
  • But Andrej Alekseevich
  • Kurilin Ilya Vasilevich
  • Rychagov Mikhail Nikolaevich
RU2630743C1
METHOD FOR AUTOMATIC ITERATIVE CLUSTERISATION OF ELECTRONIC DOCUMENTS ACCORDING TO SEMANTIC SIMILARITY, METHOD FOR SEARCH IN PLURALITY OF DOCUMENTS CLUSTERED ACCORDING TO SEMANTIC SIMILARITY AND COMPUTER-READABLE MEDIA 2014
  • Klintsov Viktor Petrovich
  • Seledkin Vjacheslav Alekseevich
RU2556425C1
CLUSTERING OF DOCUMENTS 2020
  • Stanislav Semenov
  • Alexandra Antonova
  • Alexey Misyrev
RU2768209C1
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Davidov Dmitrij Georgievich
  • Ryupichev Dmitrij Yurevich
  • Balashov Aleksandr Viktorovich
RU2775351C1
IMAGE AND ATTRIBUTE QUALITY, IMAGE ENHANCEMENT AND IDENTIFICATION OF FEATURES FOR IDENTIFICATION BY VESSELS AND INDIVIDUALS, AND COMBINING INFORMATION ON EYE VESSELS WITH INFORMATION ON FACES AND/OR PARTS OF FACES FOR BIOMETRIC SYSTEMS 2016
  • Saripalle Sashi K.
  • Gottemukkula Vikas
  • Derakhshani Reza R.
RU2691195C1
IMAGE AND ATTRIBUTE QUALITY, IMAGE ENHANCEMENT AND IDENTIFICATION OF FEATURES FOR IDENTIFICATION BY VESSELS AND FACES AND COMBINING INFORMATION ON EYE VESSELS WITH INFORMATION ON FACES AND / OR PARTS OF FACES FOR BIOMETRIC SYSTEMS 2016
  • Saripalle, Sashi, K.
  • Gottemukkula, Vikas
  • Derakhshani, Reza, R.
RU2711050C2

RU 2 656 708 C1

Authors

Anisimovskiy Valery Valerievich

Dates

2018-06-06Published

2017-06-29Filed