FIELD: information technology.
SUBSTANCE: method for detecting spam in a message sent via e-mail is disclosed, wherein: a) by means a message processing means the message sent via e-mail is received and where the message header contains a message in the form of a text which comprises more than three words; b) the message processing means determines the text parameters of the message subject, where the text parameters of the message topic are at least one of: a language on which the text of message subject is written, the number of words in the text of the message subject, the number of the articles in the text of the message subject, the number of punctuation characters in the text of the message subject, the number of pronouns in the text of the message subject, the number of prepositions in the text of the message subject; b) by means of a coefficient determining means, k and n coefficients are determined for constructing k-skip-and-gram of word combinations based on text parameters of the message subject by rules defining the coefficients; g) using the coefficient determining means, a set of k-skip-n-gram of the word combinations from the text of the message subject using certain values of k and n coefficients; d) using a vector construction means, the vector is constructed to calculate the degree of cosine similarity for each k-skip-n-gram of the word combination from the generated set; e) using the vector construction means for each constructed vector, the degree of cosine similarity with known vectors from the vector database is calculated; g) using a spam detection means, a theme category of the message is determined based on a plurality of calculated degrees of cosine similarity with known vectors; h) by means of a spam detection means, the current value of the spam coefficient is calculated based on the plurality of counted degrees of cosine similarity of all constructed vectors; and i) by means of the spam detection means, when a certain threshold value of the spam coefficient is exceeded, the spam in the received message is detected.
EFFECT: spam detection in the message sent via e-mail.
2 cl, 5 dwg
Title | Year | Author | Number |
---|---|---|---|
SYSTEM AND METHOD OF ELIMINATING SHINGLES FROM INSIGNIFICANT PARTS OF MESSAGES WHEN FILTERING SPAM | 2013 |
|
RU2583713C2 |
USER EVALUATION SYSTEM AND METHOD FOR MESSAGE FILTERING | 2012 |
|
RU2510982C2 |
METHOD OF DETECTING FRAUDULENT LETTER RELATING TO CATEGORY OF INTERNAL BEC ATTACKS | 2021 |
|
RU2766539C1 |
METHOD FOR SEMANTIC HASHING OF TEXT DATA | 2023 |
|
RU2822863C1 |
SYSTEMS AND METHODS FOR SPAM DETECTION USING CHARACTER HISTOGRAMS | 2012 |
|
RU2601193C2 |
SYSTEM AND METHOD OF RATING ELECTRONIC MESSAGES TO CONTROL SPAM | 2013 |
|
RU2541123C1 |
METHOD FOR STREAM PROCESSING OF TEXT MESSAGES | 2003 |
|
RU2251148C1 |
METHOD AND SYSTEM FOR OBTAINING VECTOR REPRESENTATION OF ELECTRONIC TEXT DOCUMENT FOR CLASSIFICATION BY CATEGORIES OF CONFIDENTIAL INFORMATION | 2021 |
|
RU2775358C1 |
CLASSIFICATION OF DOCUMENTS USING MULTILEVEL SIGNATURE TEXT | 2014 |
|
RU2632408C2 |
METHOD AND SYSTEM FOR ARRANGING DIALOGUE WITH USER IN USER-FRIENDLY CHANNEL | 2018 |
|
RU2688758C1 |
Authors
Dates
2017-10-24—Published
2016-06-24—Filed