FIELD: computer science.
SUBSTANCE: method for detecting obscene words in text. The technical result is achieved due to the fact that, at the learning phase, the method comprises: obtaining the first word with the first word corresponding to a certain obscene word; generating the first set of misspelled words, wherein the first set of misspelled words comprises a plurality of misspelled variants of the first word; forming training pairs, the training pairs comprising: a set of positive training pairs containing a first word paired with each misspelled version of the first word; training a machine learning algorithm, the training comprising: determining for each training pair a set of characteristics representing a property of the training pairs; generating an output function based on a set of characteristics, wherein the output function is configured to, when used, assign an obscenity score, the obscenity score indicating a probability that the word is obscene.
EFFECT: increased accuracy of word classification.
28 cl, 6 dwg
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM IN TRANSLATION | 2020 |
|
RU2770569C2 |
METHOD AND SERVER FOR PROCESSING TEXT SEQUENCE IN MACHINE PROCESSING TASK | 2020 |
|
RU2775820C2 |
METHOD AND SYSTEM FOR SPEECH SYNTHESIS FROM TEXT | 2017 |
|
RU2692051C1 |
METHOD AND SYSTEM FOR GENERATING AN OBJECT CARD | 2018 |
|
RU2739554C1 |
METHOD AND SYSTEM FOR GENERATING FEATURE FOR RANGING DOCUMENT | 2018 |
|
RU2733481C2 |
METHOD AND SERVER FOR TRAINING MACHINE LEARNING ALGORITHM IN OBJECT RANKING | 2020 |
|
RU2782502C1 |
METHOD AND SERVER FOR PRESENTING RECOMMENDED CONTENT ITEM TO USER | 2017 |
|
RU2699574C2 |
METHOD AND SERVER FOR DETERMINING TRAINING SET FOR MACHINE LEARNING ALGORITHM (MLA) TRAINING | 2020 |
|
RU2817726C2 |
METHOD AND SYSTEM FOR TRAINING MACHINE LEARNING ALGORITHM TO PREDICT VISIBILITY ASSESSMENT | 2022 |
|
RU2814079C1 |
METHOD AND SERVER FOR REPEATED TRAINING OF MACHINE LEARNING ALGORITHM | 2019 |
|
RU2743932C2 |
Authors
Dates
2023-09-15—Published
2020-12-22—Filed