FIELD: computer technology.
SUBSTANCE: invention relates to computer technology. A computer-implemented method for vector representation of an electronic text document for determining the category of confidential information contained in it, performed using a processor and containing the steps, at which: a model for placing m-skip-n-grams in clusters is formed, while forming the said model: determining the list of used m-skip-n-grams; converting to a vector representation of each m-skip-n-gram from the list; m-skip-n-gram clustering; perform processing of the text document using the obtained model, during which: the occurrence of m-skip-n-grams in the document is counted; determining document clusters based on the occurrence of m-skip-n-grams; the number of occurrences of m-skip-n-grams from each cluster is summarized; a vector representation of the document is formed; the category of confidential information in a text document is defined.
EFFECT: enabling the preservation of different semantics of words in a document by mapping words to multiple clusters.
10 cl, 6 dwg, 1 tbl
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT | 2021 |
|
RU2775351C1 |
THEMATIC MODELS WITH A PRIORI TONALITY PARAMETERS BASED ON DISTRIBUTED REPRESENTATIONS | 2018 |
|
RU2719463C1 |
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT | 2019 |
|
RU2755606C2 |
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES | 2019 |
|
RU2737720C1 |
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION | 2018 |
|
RU2701995C2 |
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD | 2019 |
|
RU2718978C1 |
AI TRANSACTION ADMINISTRATION SYSTEM | 2020 |
|
RU2777958C2 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
METHOD FOR GENERATING MATHEMATICAL MODELS OF A PATIENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES | 2017 |
|
RU2720363C2 |
Authors
Dates
2022-06-29—Published
2021-09-24—Filed