FIELD: image processing means.
SUBSTANCE: invention relates to analysis and processing of document images. Method for separating texts and illustrations in images of document pages comprises the steps of: receiving images of document pages; segmenting images of document pages into areas of interest; extracting a feature vector for each area of interest; and classifying each of the extracted feature vectors into one of two classes: text or illustration; wherein the extraction of the feature vector comprises sub-steps of: changing a size of the area of interest while remaining the ratio of its sides; extracting connectivity components from the area of interest of a changed size and calculating their centroids; determining the nearest neighbors for each centroid; constructing a two-dimensional histogram of normalized distances and angles for all pairs consisting of a centroid and each of its five nearest neighboring centroids; and transforming the two-dimensional histogram into a feature vector.
EFFECT: increased accuracy of separating texts and illustrations in images of documents and minimized errors of said separation.
16 cl, 21 dwg, 5 tbl
Authors
Dates
2018-06-06—Published
2017-06-29—Filed