FIELD: calculating; counting.
SUBSTANCE: invention relates to computer engineering. Disclosed is a method of classifying documents, comprising a computer system for generating a plurality of image features by processing images from a plurality of documents; creating a plurality of features of one or more texts by processing texts from a plurality of documents; creating a plurality of feature vectors, such that each feature vector from a plurality of feature vectors includes at least one of the following: a subset of the plurality of image features and a subset of the plurality of text features; clustering a plurality of feature vectors to obtain a plurality of clusters; determining a plurality of document categories, such that each category of documents from a plurality of document categories is determined by a corresponding feature cluster from a plurality of feature clusters; training a classifier to obtain one or more values reflecting the degree of connectivity of one or more source documents with one or more categories of documents from a plurality of document categories; and use of a trained classifier for classifying one or more documents based on said derived one or more values.
EFFECT: technical result is classification of documents.
20 cl, 12 dwg
Authors
Dates
2019-10-02—Published
2018-03-23—Filed