FIELD: data protection.
SUBSTANCE: specifically the reversible depersonalization of confidential data while maintaining the data structure in text documents. The method for reversible depersonalizing confidential data in text documents while preserving the data structure comprises the following steps: receiving a document with text data; segmenting and tokenizing text data; performing token vectorization; determining whether each token belongs to the category of confidential data using a machine learning model; depersonalizing data related to tokens with confidential data while maintaining the data structure; forming a table of substitutions, and a list of replacements; replacing the original confidential data in the text document with depersonalized data according to the list of replacements, and the depersonalized data is formatted in accordance with the positions of formatting changes of parts of the text.
EFFECT: providing the possibility of preserving the stylistic, semantic, lexical and morphological structure of data in text documents when depersonalizing them, increasing the accuracy of depersonalizing data in text documents by identifying confidential data in text documents using a machine learning model.
3 cl, 6 dwg, 6 tbl
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR PARAPHRASING TEXT | 2023 |
|
RU2814808C1 |
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT | 2019 |
|
RU2755606C2 |
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD | 2019 |
|
RU2718978C1 |
METHOD OF IDENTIFYING PERSONAL DATA OF OPEN SOURCES OF UNSTRUCTURED INFORMATION | 2013 |
|
RU2549515C2 |
CLASSIFICATION OF DOCUMENTS BY LEVELS OF CONFIDENTIALITY | 2019 |
|
RU2732850C1 |
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES | 2021 |
|
RU2823914C2 |
Authors
Dates
2023-08-30—Published
2022-12-20—Filed