FIELD: physics.
SUBSTANCE: invention relates to computer engineering for analyzing documents. Technical result is achieved by obtaining an input document; determining, by evaluating a document similarity function using one or more calculated attributes of the input document, a plurality of similarity metrics, where each similarity indicator from the plurality of similarity indicators reflects the degree of similarity between the input document and the corresponding cluster of documents from the plurality of document clusters; determining the maximum similarity score from the plurality of similarity indicators; determining that the input document does not belong to any of the document clusters from the plurality of document clusters if the maximum similarity score is below a threshold value; creating a new cluster of documents; and assigning an input document to a new cluster of documents.
EFFECT: high accuracy of clustering documents.
20 cl, 6 dwg
Title | Year | Author | Number |
---|---|---|---|
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION | 2018 |
|
RU2701995C2 |
SYSTEM AND METHOD OF FORMING TRAINING SET FOR MACHINE LEARNING ALGORITHM | 2017 |
|
RU2711125C2 |
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES | 2019 |
|
RU2737720C1 |
SIMULTANEOUS RECOGNITION OF PERSON ATTRIBUTES AND IDENTIFICATION OF PERSON IN ORGANIZING PHOTO ALBUMS | 2018 |
|
RU2710942C1 |
SYSTEMS AND METHODS FOR DETECTING BEHAVIOURAL THREATS | 2019 |
|
RU2803399C2 |
SYSTEMS AND METHODS FOR DETECTING BEHAVIOURAL THREATS | 2019 |
|
RU2772549C1 |
METHOD OF CONSTRUCTING AND DETECTION OF THEME HULL STRUCTURE | 2013 |
|
RU2583716C2 |
SYSTEMS AND METHODS FOR DETECTING BEHAVIOURAL THREATS | 2019 |
|
RU2778630C1 |
AI TRANSACTION ADMINISTRATION SYSTEM | 2020 |
|
RU2777958C2 |
CHARACTER RECOGNITION USING A HIERARCHICAL CLASSIFICATION | 2018 |
|
RU2693916C1 |
Authors
Dates
2022-03-23—Published
2020-11-13—Filed