SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS Russian patent published in 2021 - IPC G06F40/10 G06N20/00 

Abstract RU 2758683 C2

FIELD: computing technology.

SUBSTANCE: disclosed is a system for augmentation of the training sample for machine learning algorithms, containing: at least one processor; at least one memory tool; input data processing module configured to receive the text data forming the initial training sample; data normalisation wherein the text is divided into sentences and cleared of characters; data vectorisation module configured to convert the normalised sentences into the vector form, wherein, in the course of said converting, each received sentence is split into minimally significant parts constituting words and punctuation marks; tokenisation of said minimally significant parts; forming of vector representations for each token; and forming of an averaged vector representation of a normalised sentence; a text data enrichment module containing a set of text data collected from open sources and metadata for vectorisation thereof and construction of a search index; a text index module configured to form a text index based on the vector representations of the text data; a training sample augmentation module configured to supplement and/or adjust the initial text sample based on the selection of relevant vector representations of tokens in the text data enrichment module using determination of the measure of token proximity in the vector space.

EFFECT: ensured selection of text data for augmentation of the training sample based on the characteristics of the text of the input training sample.

22 cl, 3 dwg

Similar patents RU2758683C2

Title Year Author Number
TEXT CLASSIFICATION METHOD AND SYSTEM 2022
  • Konodyuk Nikita Evgenevich
  • Tikhonova Mariya Ivanovna
RU2818693C2
METHOD AND SYSTEM FOR GENERATING TEXT 2023
  • Tikhonova Mariya Ivanovna
RU2817524C1
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION 2022
  • Tikhonova Mariya Ivanovna
RU2796208C1
METHOD AND SYSTEM FOR PARAPHRASING TEXT 2023
  • Fenogenova Alena Sergeevna
  • Tikhonova Mariya Ivanovna
RU2814808C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2804747C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2802549C1
SYSTEM AND METHOD FOR AUTOMATED ASSESSMENT OF INTENTIONS AND EMOTIONS OF USERS OF DIALOGUE SYSTEM 2020
  • Fenogenova Alena Sergeevna
  • Shavrina Tatyana Olegovna
RU2762702C2
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD 2019
  • Prikhodko Olga Viktorovna
  • Khyurri Ruslan Vladimirovich
  • Prikhodko Olga Viktorovna
RU2718978C1
METHOD OF CREATING MODEL FOR ANALYSING DIALOGUES BASED ON ARTIFICIAL INTELLIGENCE FOR PROCESSING USER REQUESTS AND SYSTEM USING SUCH MODEL 2019
  • Antyukhov Denis Olegovich
  • Pugachev Leonid Petrovich
RU2730449C2
METHOD OF TRAINED RECURRENT NEURAL NETWORK DEBUGGING 2019
  • Zharov Yaroslav Maksimovich
  • Korzhenkov Denis Mikhajlovich
RU2715024C1

RU 2 758 683 C2

Authors

Shavrina Tatyana Olegovna

Dates

2021-11-01Published

2020-04-28Filed