METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION Russian patent published in 2022 - IPC G06F16/35 G06F40/284 G06F40/30 

Abstract RU 2775358 C1

FIELD: computer technology.

SUBSTANCE: invention relates to computer technology. A computer-implemented method for vector representation of an electronic text document for determining the category of confidential information contained in it, performed using a processor and containing the steps, at which: a model for placing m-skip-n-grams in clusters is formed, while forming the said model: determining the list of used m-skip-n-grams; converting to a vector representation of each m-skip-n-gram from the list; m-skip-n-gram clustering; perform processing of the text document using the obtained model, during which: the occurrence of m-skip-n-grams in the document is counted; determining document clusters based on the occurrence of m-skip-n-grams; the number of occurrences of m-skip-n-grams from each cluster is summarized; a vector representation of the document is formed; the category of confidential information in a text document is defined.

EFFECT: enabling the preservation of different semantics of words in a document by mapping words to multiple clusters.

10 cl, 6 dwg, 1 tbl

Similar patents RU2775358C1

Title Year Author Number
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Davidov Dmitrij Georgievich
  • Ryupichev Dmitrij Yurevich
  • Balashov Aleksandr Viktorovich
RU2775351C1
THEMATIC MODELS WITH A PRIORI TONALITY PARAMETERS BASED ON DISTRIBUTED REPRESENTATIONS 2018
  • Tutubalina Elena Viktorovna
  • Nikolenko Sergey Igorevich
RU2719463C1
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT 2019
  • Terenin Aleksej Alekseevich
  • Kotova Margarita Aleksandrovna
RU2755606C2
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES 2019
  • Stanislav Semenov
RU2737720C1
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION 2018
  • Nikita Orlov
  • Konstantin Anisimovich
RU2701995C2
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD 2019
  • Prikhodko Olga Viktorovna
  • Khyurri Ruslan Vladimirovich
  • Prikhodko Olga Viktorovna
RU2718978C1
AI TRANSACTION ADMINISTRATION SYSTEM 2020
  • Fehling, Ronny
  • Short, Samantha
  • De Goursac, Axel
  • Dubois, Raphael
  • Erlebach, Joerg
  • Von Funck, Karin
RU2777958C2
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2804747C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2802549C1
METHOD FOR GENERATING MATHEMATICAL MODELS OF A PATIENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES 2017
  • Drokin Ivan Sergeevich
  • Bukhvalov Oleg Leonidovich
  • Sorokin Sergej Yurevich
RU2720363C2

RU 2 775 358 C1

Authors

Vyshegorodtsev Kirill Evgenevich

Obolenskij Ivan Aleksandrovich

Golovnya Maksim Sergeevich

Dates

2022-06-29Published

2021-09-24Filed