FIELD: text correction.
SUBSTANCE: invention relates to a system and a method for correcting spelling errors. The method includes a preparatory stage for tokenization, reducing the letter case and depunctuation, and a final stage for increasing the letter case and returning punctuation, characterized in that the tokenized sentence is read through the correction generation module and, using the language dictionary and the weighted Levenshtein metric, correction candidates are found that form a ranked space of hypotheses of words for correction, then the sentence is analyzed using the language model unit, the data of the tokenized sentence and the space of hypotheses-words is entered and a matrix of the credibility of words and corrections in the sentence is obtained, using the same unit the credibility of corrections is assessed, a space of segments-hypotheses is obtained with assessments of the credibility of segments in the context, returning to the help of the correction generation module, the gains of segments-hypotheses are assessed, a conclusion about the credibility is obtained, then, using the correction decision-making module, the space of assessed hypotheses is passed through from left to right along the sentence, and a decision is made on making edits or saving the state without editing, the corrected text is received.
EFFECT: increased efficiency of correcting spelling errors by implementing an assessment of the credibility of corrections and making decisions about correcting errors.
4 cl, 5 dwg
Title | Year | Author | Number |
---|---|---|---|
TEXT CLASSIFICATION METHOD AND SYSTEM | 2022 |
|
RU2818693C2 |
METHOD AND SYSTEM FOR PARAPHRASING TEXT | 2023 |
|
RU2814808C1 |
METHOD AND SYSTEM FOR GENERATING TEXT | 2023 |
|
RU2817524C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
METHOD AND SYSTEM FOR ARRANGING DIALOGUE WITH USER IN USER-FRIENDLY CHANNEL | 2018 |
|
RU2688758C1 |
SYSTEM AND METHOD FOR AUTOMATED ASSESSMENT OF INTENTIONS AND EMOTIONS OF USERS OF DIALOGUE SYSTEM | 2020 |
|
RU2762702C2 |
SYSTEM AND METHODOLOGY OF AUTOMATIC LANGUAGE LEARNING ON BASIS OF SYNTACTIC MODELS FREQUENCY | 2015 |
|
RU2632656C2 |
METHOD AND SYSTEM FOR RETRIEVING NAMED ENTITIES | 2020 |
|
RU2760637C1 |
METHOD AND SYSTEM FOR GENERATION OF ARTICLES IN NATURAL LANGUAGE DICTIONARY | 2014 |
|
RU2639280C2 |
METHODS AND SYSTEMS FOR IDENTIFYING FIELDS IN A DOCUMENT | 2020 |
|
RU2760471C1 |
Authors
Dates
2021-08-12—Published
2020-05-21—Filed