Show metadata Hide metadata

(19)

(11)

2 775 820

(13)

(51)

IPC

G06F40/20(2020-01-01)

(21) (22)

Application

2020114693, 2020-04-24

(24)

Start date

2020-04-24

(22)

Actual filing date

2020-04-24

(45)

Published

2022-07-11

(72)

Inventor

Yemelyanenko Dmitry ViktorovichProvilkov Ivan SergeevichVoyta Elena Aleksandrovna

(73)

Holder

Obshchestvo S Ogranichennoi Otvetstvennostiu

METHOD AND SERVER FOR PROCESSING TEXT SEQUENCE IN MACHINE PROCESSING TASK Russian patent published in 2022 - IPC G06F40/20

Abstract RU 2775820 C2

FIELD: information processing.

SUBSTANCE: invention relates to methods and a server for processing a text sequence in a machine processing task. In the method, a token dictionary is received by the server, which stores a set of tokens from a predefined text corpus, while a token from the set of tokens is a single symbol or a merged set of tokens; a merge table is received by the server, indicating possible merges of token pairs from the set of tokens, while the token from the possible merge is associated with a frequency of occurrence of this token in the predefined text corpus; a text sequence is received by the server, indicating at least one word. For a word from the text sequence: the token dictionary is used by the server to divide the word into an initial token sequence representing individual symbols of this word; tokens from the initial token sequence are iteratively merged by the server in order to form a final token sequence for this word. At the same time, iterative merging includes: at the current merge iteration: the merge table is used by the server to determine a set of possible merges of pairs of neighboring tokens from a current token sequence for this merge iteration; at least one merge is excluded by the server from the set of possible merges based on the probability of exclusion, and thus, a reduced set of possible merges is formed for this merge iteration, while the reduced set of possible merges is smaller than the set of possible merges. The reduced set of possible merges is used by the server to form a new token sequence by performing at least one merge from the reduced set of possible merges in the current token sequence, while the new token sequence is to be used by the server as the current token sequence at the next merge iteration. At another merge iteration, after the current merge iteration: the current token sequence is determined by the server for another merge iteration as the final token sequence to be used in the machine processing task, in the absence of possible merges in the current token sequence for another merge iteration.

EFFECT: increase in the efficiency of training data preparation due to obtaining several options of word segmentation.

30 cl, 4 dwg

Similar patents RU2775820C2

Title	Year	Author	Number
METHOD AND SERVER FOR PERFORMING PROBLEM-ORIENTED TRANSLATION	2021	Emelyanenko Dmitry Viktorovich Ryabinin Maksim Konstantinovich	RU2820953C2
METHOD AND SERVER FOR PERFORMING CONTEXT-SENSITIVE TRANSLATION	2021	Golovanov Ilya Aleksandrovich Ivanov Georgy Viktorovich Noskov Aleksey Anatolevich	RU2812301C2
METHODS AND ELECTRONIC DEVICES FOR PACKAGING REQUESTS INTENDED FOR PROCESSING BY PROCESSING UNIT	2021	Emelyanenko Dmitry Viktorovich	RU2810916C2
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES	2021	Vodolazskij Daniil Ivanovich Gladkikh Prokhor Vladimirovich Sorokin Semen Aleksandrovich Cherkasov Roman Vladislavovich Gazizov Kuat	RU2823914C2
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM IN TRANSLATION	2020	Dvorkovich Anton Aleksandrovich Kovarsky Boris Andreevich	RU2770569C2
METHOD AND SERVER FOR TEACHING A NEURAL NETWORK TO FORM A TEXT OUTPUT SEQUENCE	2020	Petrov Aleksey Sergeevich Gubanov Sergey Dmitrievich Gaydaenko Sergey Aleksandrovich	RU2798362C2
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM FOR TRANSLATION	2020	Dvorkovich Anton Aleksandrovich Komarov Ivan Sergeevich	RU2789796C2
METHOD AND DEVICE FOR VEHICLE CONTROL	2021	Charkin Kanstantin Lobanov Aleksei	RU2767826C1
METHOD AND APPARATUS FOR TRAINING MACHINE LEARNING ALGORITHM (MLA) FOR CREATING CONTENT RECOMMENDATIONS IN A RECOMMENDATION SYSTEM AND A METHOD AND APPARATUS FOR CREATING RECOMMENDED CONTENT USING A MACHINE LEARNING ALGORITHM	2016	Lifar Igor Igorevich Lamburt Viktor Grigorevich	RU2731659C2
METHODS AND SERVERS FOR TRAINING MODEL TO DETECT SPEAKER CHANGE	2024	Gritskevich Evgenii Marianovich	RU2841235C1

RU 2 775 820 C2

Authors

Yemelyanenko Dmitry Viktorovich

Provilkov Ivan Sergeevich

Voyta Elena Aleksandrovna

Dates

2022-07-11—Published

2020-04-24—Filed