AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION Russian patent published in 2019 - IPC G06F16/35 G06F16/55 

Abstract RU 2701995 C2

FIELD: calculating; counting.

SUBSTANCE: invention relates to computer engineering. Disclosed is a method of classifying documents, comprising a computer system for generating a plurality of image features by processing images from a plurality of documents; creating a plurality of features of one or more texts by processing texts from a plurality of documents; creating a plurality of feature vectors, such that each feature vector from a plurality of feature vectors includes at least one of the following: a subset of the plurality of image features and a subset of the plurality of text features; clustering a plurality of feature vectors to obtain a plurality of clusters; determining a plurality of document categories, such that each category of documents from a plurality of document categories is determined by a corresponding feature cluster from a plurality of feature clusters; training a classifier to obtain one or more values reflecting the degree of connectivity of one or more source documents with one or more categories of documents from a plurality of document categories; and use of a trained classifier for classifying one or more documents based on said derived one or more values.

EFFECT: technical result is classification of documents.

20 cl, 12 dwg

Similar patents RU2701995C2

Title Year Author Number
USE OF AUTOENCODERS FOR LEARNING TEXT CLASSIFIERS IN NATURAL LANGUAGE 2017
  • Anisimovich Konstantin Vladimirovich
  • Indenbom Evgenij Mikhajlovich
  • Ivashnev Ivan Ivanovich
RU2678716C1
CHARACTER RECOGNITION USING A HIERARCHICAL CLASSIFICATION 2018
  • Aleksey Alekseevich Zhuravlev
RU2693916C1
AI TRANSACTION ADMINISTRATION SYSTEM 2020
  • Fehling, Ronny
  • Short, Samantha
  • De Goursac, Axel
  • Dubois, Raphael
  • Erlebach, Joerg
  • Von Funck, Karin
RU2777958C2
METHOD FOR SEPARATING TEXTS AND ILLUSTRATIONS IN IMAGES OF DOCUMENTS USING A DESCRIPTOR OF DOCUMENT SPECTRUM AND TWO-LEVEL CLUSTERING 2017
  • Anisimovskiy Valery Valerievich
RU2656708C1
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES 2019
  • Stanislav Semenov
RU2737720C1
RECOGNITION OF EVENTS ON PHOTOGRAPHS WITH AUTOMATIC SELECTION OF ALBUMS 2020
  • Savchenko Andrey Vladimirovich
RU2742602C1
HANDWRITING RECOGNITION USING NEURAL NETWORKS 2020
  • Andrey Upshinskiy
RU2757713C1
METHOD OF CONSTRUCTING AND DETECTION OF THEME HULL STRUCTURE 2013
  • Bogdanova Daria Nikolaevna
  • Kopylov Nikolay Yurievich
RU2583716C2
METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Obolenskij Ivan Aleksandrovich
  • Golovnya Maksim Sergeevich
RU2775358C1
NEURAL NETWORK TRAINING BY MEANS OF SPECIALIZED LOSS FUNCTIONS 2018
  • Aleksey Alekseevich Zhuravlev
RU2707147C1

RU 2 701 995 C2

Authors

Nikita Orlov

Konstantin Anisimovich

Dates

2019-10-02Published

2018-03-23Filed