FIELD: physics.
SUBSTANCE: group of inventions relates to computer systems and can be used to construct and process a natural language model. Method comprises the following steps: obtaining a plurality of rows, where each row of a plurality of rows comprises a plurality of symbols; for each row of a plurality of lines, generating, by the processing device, a first sequence of vectors based on at least a maximum word length for each symbol in the row; transmitting, to a machine learning unit, a first sequence of vectors for each row of a plurality of rows; obtaining from the machine learning module the probability of occurrence of each line from a plurality of rows; adding a line to the natural language model based on the value of the probability of occurrence obtained from the machine learning module and using the obtained model in natural language processing tasks.
EFFECT: technical result is improved prediction of probability of appearance of linguistic unit.
20 cl, 5 dwg
Title | Year | Author | Number |
---|---|---|---|
TEXT RECOGNITION USING ARTIFICIAL INTELLIGENCE | 2017 |
|
RU2691214C1 |
OPTICAL CHARACTER RECOGNITION BY MEANS OF COMBINATION OF NEURAL NETWORK MODELS | 2020 |
|
RU2768211C1 |
HANDWRITING RECOGNITION USING NEURAL NETWORKS | 2020 |
|
RU2757713C1 |
IDENTIFICATION OF BLOCKS OF RELATED WORDS IN DOCUMENTS OF COMPLEX STRUCTURE | 2019 |
|
RU2765884C2 |
RETRIEVING FIELDS USING NEURAL NETWORKS WITHOUT USING TEMPLATES | 2019 |
|
RU2737720C1 |
AUTOMATIC DETERMINATION OF SET OF CATEGORIES FOR DOCUMENT CLASSIFICATION | 2018 |
|
RU2701995C2 |
DETECTING TEXT FIELDS USING NEURAL NETWORKS | 2018 |
|
RU2699687C1 |
TEACHING LANGUAGE MODELS USING TEXT CORPUSES CONTAINING REALISTIC ERRORS OF OPTICAL CHARACTER RECOGNITION (OCR) | 2019 |
|
RU2721187C1 |
METHOD FOR CONTROLLING A DIALOGUE AND NATURAL LANGUAGE RECOGNITION SYSTEM IN A PLATFORM OF VIRTUAL ASSISTANTS | 2020 |
|
RU2759090C1 |
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
Authors
Dates
2020-01-24—Published
2018-06-27—Filed