TEXT CLASSIFICATION METHOD AND SYSTEM Russian patent published in 2024 - IPC G06F40/279 G06F16/35 G06N20/00 

Abstract RU 2818693 C2

FIELD: physics.

SUBSTANCE: group of inventions relates to computer engineering and can be used for additional training of a language model for solving the problem of text classification. Method of classifying text by a language model is carried out by at least one computing device and comprises the following steps: obtaining an input data set corresponding to the required classification task, in the format on the basis of which the language model is additionally trained; formatting it, supplementing it with symbols, each of which corresponds to an abstract pseudoword; performing tokenization and vectorization of the input data set, wherein symbols corresponding to abstract pseudowords are replaced with trained vector representations of symbols; processing the obtained data, obtaining a vector of logits, which reflects the probability distribution of classes corresponding to words of the dictionary of the language model; selecting target components of logits, corresponding to tokens of target classes of solved problem of classification; determining the logit component reflecting the highest probability of belonging to the target class; response is generated in text form corresponding to the selected component.

EFFECT: enabling automatic generation of hints for additional training of the language model.

7 cl, 5 dwg

Similar patents RU2818693C2

Title Year Author Number
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION 2022
  • Tikhonova Mariya Ivanovna
RU2796208C1
METHOD AND SYSTEM FOR GENERATING TEXT 2023
  • Tikhonova Mariya Ivanovna
RU2817524C1
METHOD AND SYSTEM FOR PARAPHRASING TEXT 2023
  • Fenogenova Alena Sergeevna
  • Tikhonova Mariya Ivanovna
RU2814808C1
SYSTEM AND METHOD FOR AUGMENTATION OF THE TRAINING SAMPLE FOR MACHINE LEARNING ALGORITHMS 2020
  • Shavrina Tatyana Olegovna
RU2758683C2
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2804747C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2802549C1
METHOD AND SYSTEM FOR RETRIEVING NAMED ENTITIES 2020
  • Emelyanov Anton Aleksandrovich
RU2760637C1
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD 2019
  • Prikhodko Olga Viktorovna
  • Khyurri Ruslan Vladimirovich
  • Prikhodko Olga Viktorovna
RU2718978C1
SYSTEM AND METHOD FOR AUTOMATED ASSESSMENT OF INTENTIONS AND EMOTIONS OF USERS OF DIALOGUE SYSTEM 2020
  • Fenogenova Alena Sergeevna
  • Shavrina Tatyana Olegovna
RU2762702C2
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT 2020
  • Semenov Stanislav Vladimirovich
  • Lanin Mikhail Olegovich
RU2760471C1

RU 2 818 693 C2

Authors

Konodyuk Nikita Evgenevich

Tikhonova Mariya Ivanovna

Dates

2024-05-03Published

2022-05-27Filed