METHOD AND SERVER FOR PROCESSING TEXT SEQUENCE IN MACHINE PROCESSING TASK Russian patent published in 2022 - IPC G06F40/20 

Abstract RU 2775820 C2

FIELD: information processing.

SUBSTANCE: invention relates to methods and a server for processing a text sequence in a machine processing task. In the method, a token dictionary is received by the server, which stores a set of tokens from a predefined text corpus, while a token from the set of tokens is a single symbol or a merged set of tokens; a merge table is received by the server, indicating possible merges of token pairs from the set of tokens, while the token from the possible merge is associated with a frequency of occurrence of this token in the predefined text corpus; a text sequence is received by the server, indicating at least one word. For a word from the text sequence: the token dictionary is used by the server to divide the word into an initial token sequence representing individual symbols of this word; tokens from the initial token sequence are iteratively merged by the server in order to form a final token sequence for this word. At the same time, iterative merging includes: at the current merge iteration: the merge table is used by the server to determine a set of possible merges of pairs of neighboring tokens from a current token sequence for this merge iteration; at least one merge is excluded by the server from the set of possible merges based on the probability of exclusion, and thus, a reduced set of possible merges is formed for this merge iteration, while the reduced set of possible merges is smaller than the set of possible merges. The reduced set of possible merges is used by the server to form a new token sequence by performing at least one merge from the reduced set of possible merges in the current token sequence, while the new token sequence is to be used by the server as the current token sequence at the next merge iteration. At another merge iteration, after the current merge iteration: the current token sequence is determined by the server for another merge iteration as the final token sequence to be used in the machine processing task, in the absence of possible merges in the current token sequence for another merge iteration.

EFFECT: increase in the efficiency of training data preparation due to obtaining several options of word segmentation.

30 cl, 4 dwg

Similar patents RU2775820C2

Title Year Author Number
METHOD AND SERVER FOR PERFORMING PROBLEM-ORIENTED TRANSLATION 2021
  • Emelyanenko Dmitry Viktorovich
  • Ryabinin Maksim Konstantinovich
RU2820953C2
METHOD AND SERVER FOR PERFORMING CONTEXT-SENSITIVE TRANSLATION 2021
  • Golovanov Ilya Aleksandrovich
  • Ivanov Georgy Viktorovich
  • Noskov Aleksey Anatolevich
RU2812301C2
METHODS AND ELECTRONIC DEVICES FOR PACKAGING REQUESTS INTENDED FOR PROCESSING BY PROCESSING UNIT 2021
  • Emelyanenko Dmitry Viktorovich
RU2810916C2
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES 2021
  • Vodolazskij Daniil Ivanovich
  • Gladkikh Prokhor Vladimirovich
  • Sorokin Semen Aleksandrovich
  • Cherkasov Roman Vladislavovich
  • Gazizov Kuat
RU2823914C2
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM IN TRANSLATION 2020
  • Dvorkovich Anton Aleksandrovich
  • Kovarsky Boris Andreevich
RU2770569C2
METHOD AND SERVER FOR TEACHING A NEURAL NETWORK TO FORM A TEXT OUTPUT SEQUENCE 2020
  • Petrov Aleksey Sergeevich
  • Gubanov Sergey Dmitrievich
  • Gaydaenko Sergey Aleksandrovich
RU2798362C2
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM FOR TRANSLATION 2020
  • Dvorkovich Anton Aleksandrovich
  • Komarov Ivan Sergeevich
RU2789796C2
METHOD AND DEVICE FOR VEHICLE CONTROL 2021
  • Charkin Kanstantin
  • Lobanov Aleksei
RU2767826C1
METHOD AND APPARATUS FOR TRAINING MACHINE LEARNING ALGORITHM (MLA) FOR CREATING CONTENT RECOMMENDATIONS IN A RECOMMENDATION SYSTEM AND A METHOD AND APPARATUS FOR CREATING RECOMMENDED CONTENT USING A MACHINE LEARNING ALGORITHM 2016
  • Lifar Igor Igorevich
  • Lamburt Viktor Grigorevich
RU2731659C2
TEXT CLASSIFICATION METHOD AND SYSTEM 2022
  • Konodyuk Nikita Evgenevich
  • Tikhonova Mariya Ivanovna
RU2818693C2

RU 2 775 820 C2

Authors

Yemelyanenko Dmitry Viktorovich

Provilkov Ivan Sergeevich

Voyta Elena Aleksandrovna

Dates

2022-07-11Published

2020-04-24Filed