METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION Russian patent published in 2022 - IPC G06F16/35 G06F40/284 G06F40/30 

Abstract RU 2775358 C1

FIELD: computer technology.

SUBSTANCE: invention relates to computer technology. A computer-implemented method for vector representation of an electronic text document for determining the category of confidential information contained in it, performed using a processor and containing the steps, at which: a model for placing m-skip-n-grams in clusters is formed, while forming the said model: determining the list of used m-skip-n-grams; converting to a vector representation of each m-skip-n-gram from the list; m-skip-n-gram clustering; perform processing of the text document using the obtained model, during which: the occurrence of m-skip-n-grams in the document is counted; determining document clusters based on the occurrence of m-skip-n-grams; the number of occurrences of m-skip-n-grams from each cluster is summarized; a vector representation of the document is formed; the category of confidential information in a text document is defined.

EFFECT: enabling the preservation of different semantics of words in a document by mapping words to multiple clusters.

10 cl, 6 dwg, 1 tbl

Similar patents RU2775358C1

Title Year Author Number
METHOD AND SYSTEM FOR OBTAINING A VECTOR REPRESENTATION OF AN ELECTRONIC DOCUMENT 2021
  • Vyshegorodtsev Kirill Evgenevich
  • Davidov Dmitrij Georgievich
  • Ryupichev Dmitrij Yurevich
  • Balashov Aleksandr Viktorovich
RU2775351C1
METHOD AND SYSTEM FOR DETECTING OBFUSCATED MALICIOUS COMMANDS IN SYSTEM CONSOLE OF OPERATING SYSTEM 2024
  • Vyshegorodtsev Kirill Evgenevich
  • Nagornov Ivan Grigorevich
  • Balashov Aleksandr Viktorovich
  • Saukov Pavel Aleksandrovich
  • Levkina Ulyana Sergeevna
  • Novikov Evgenij Aleksandrovich
RU2838483C1
THEMATIC MODELS WITH A PRIORI TONALITY PARAMETERS BASED ON DISTRIBUTED REPRESENTATIONS 2018
  • Tutubalina Elena Viktorovna
  • Nikolenko Sergey Igorevich
RU2719463C1
METHOD AND SYSTEM FOR RECOGNIZING INFORMATION CONSTITUTING TRADE SECRET 2024
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Garbuzov Georgij Valerevich
  • Denisov Vitalij Igorevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2841161C1
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT 2019
  • Terenin Aleksej Alekseevich
  • Kotova Margarita Aleksandrovna
RU2755606C2
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES 2019
  • Stanislav Semenov
RU2737720C1
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION 2018
  • Nikita Orlov
  • Konstantin Anisimovich
RU2701995C2
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD 2019
  • Prikhodko Olga Viktorovna
  • Khyurri Ruslan Vladimirovich
  • Prikhodko Olga Viktorovna
RU2718978C1
METHOD AND SYSTEM FOR OBTAINING VECTOR PRESENTATIONS OF DATA IN TABLE TAKING INTO ACCOUNT STRUCTURE OF TABLE AND ITS CONTENT 2024
  • Volkov Maksim Aleksandrovich
RU2839037C1
AI TRANSACTION ADMINISTRATION SYSTEM 2020
  • Fehling, Ronny
  • Short, Samantha
  • De Goursac, Axel
  • Dubois, Raphael
  • Erlebach, Joerg
  • Von Funck, Karin
RU2777958C2

RU 2 775 358 C1

Authors

Vyshegorodtsev Kirill Evgenevich

Obolenskij Ivan Aleksandrovich

Golovnya Maksim Sergeevich

Dates

2022-06-29Published

2021-09-24Filed