FIELD: physics.
SUBSTANCE: invention relates to data recognition. Method of recognizing data constituting a trade secret in text documents comprises steps of: receiving a document containing text data; segmenting text data into sentences; tokenising the text data, adding a service token before each sentence; performing token vectorisation; processing vector representations of tokens using a machine learning model based on a neural network trained on data sets containing a trade secret, based on the internal attention mechanism, which allows to take into account the contextual relationships between the words and sentences of the text file; a trade secret mark is assigned to a document containing a trade secret.
EFFECT: high accuracy of recognizing data containing trade secrets, as well as possibility of providing safe data transfer due to automatic detection of documents containing trade secrets, and applying policies limiting transfer of such documents in real time.
6 cl, 3 dwg, 1 tbl
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
METHOD AND SYSTEM FOR OBTAINING VECTOR PRESENTATIONS OF DATA IN TABLE TAKING INTO ACCOUNT STRUCTURE OF TABLE AND ITS CONTENT | 2024 |
|
RU2839037C1 |
SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS | 2020 |
|
RU2758683C2 |
METHOD AND SYSTEM FOR DETECTING CONFIDENTIAL DATA | 2023 |
|
RU2838508C2 |
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES | 2021 |
|
RU2823914C2 |
METHOD AND SYSTEM FOR PARAPHRASING TEXT | 2023 |
|
RU2814808C1 |
Authors
Dates
2025-06-03—Published
2024-03-05—Filed