FIELD: data protection.
SUBSTANCE: specifically the depersonalization of confidential data while maintaining the data structure in text documents. The method for depersonalizing confidential data in text documents while preserving the data structure comprises the following steps: receiving a document with text data; segmenting and tokenizing text data; performing token vectorization; determining whether each token belongs to the category of confidential data using a machine learning model; depersonalizing data related to tokens with confidential data while maintaining the data structure; forming a list of replacements; replacing the original confidential data in the text document with depersonalized data according to the list of replacements, and in the process of replacement, the depersonalized data is formatted in accordance with the positions of formatting changes of parts of the text.
EFFECT: providing the possibility of preserving the stylistic, semantic, lexical and morphological structure of data in text documents when depersonalizing them, increasing the accuracy of depersonalizing data in text documents by identifying confidential data in text documents using a machine learning model.
10 cl, 6 dwg, 6 tbl
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT | 2019 |
|
RU2755606C2 |
METHOD AND SYSTEM FOR PARAPHRASING TEXT | 2023 |
|
RU2814808C1 |
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES | 2021 |
|
RU2823914C2 |
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
CLASSIFICATION OF DOCUMENTS BY LEVELS OF CONFIDENTIALITY | 2019 |
|
RU2732850C1 |
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION | 2014 |
|
RU2665239C2 |
METHOD OF IDENTIFYING PERSONAL DATA OF OPEN SOURCES OF UNSTRUCTURED INFORMATION | 2013 |
|
RU2549515C2 |
Authors
Dates
2023-10-04—Published
2022-12-09—Filed