CLASSIFICATION OF DOCUMENTS USING MULTILEVEL SIGNATURE TEXT Russian patent published in 2017 - IPC G06F17/27 

Abstract RU 2632408 C2

FIELD: information technology.

SUBSTANCE: to determine the signature of the text of target document, bounded by the predetermined lower and upper bounds, a plurality of text tokens are selected by selecting a preliminary set of text tokens, determining a pre-set token count, and, when the preliminary token set exceeds a predetermined threshold, truncating this set to form a selected set of tokens so that the selected set does not exceed the threshold. Size of the signature fragment is determined in accordance with the upper and lower bounds and in accordance with the count of the selected set. A plurality of signature fragments are determined according to the hash of an individual token of the selected set. Each fragment contains a sequence of characters whose length is equal to the size of fragment. Concatenation of multiple fragments is performed to form a text signature.

EFFECT: increasing the speed of calculations and reducing the required amount of memory when determining the signature of text without reducing the accuracy of comparing documents by their signatures.

22 cl, 18 dwg, 3 tbl

Similar patents RU2632408C2

Title Year Author Number
SYSTEMS AND METHODS FOR SPAM DETECTION USING CHARACTER HISTOGRAMS 2012
  • Dikyu Danel
  • Lupsesku Z. Luchan
RU2601193C2
SYSTEM AND METHODS FOR SPAM DETECTION USING FREQUENCY SPECTRA OF CHARACTER STRINGS 2012
  • Dikyu Danel
  • Lupsesku Z. Luchan
RU2601190C2
SYSTEM AND METHODS FOR DETECTING NETWORK FRAUD 2017
  • Damian Alin-Octavian
RU2744671C2
SYSTEMS AND METHODS OF DYNAMIC INDICATORS AGGREGATION TO DETECT NETWORK FRAUD 2012
  • Tibejka N. Marius
  • Damyan O. Alin
  • Visan L. Razvan
RU2607229C2
METHOD AND SYSTEM FOR CREATING A LIST OF ELECTRONIC MESSAGES 2014
  • Shmarovoz Georgij Valentinovich
  • Kozlov Aleksandr Viktorovich
  • Demyanenko Anna Aleksandrovna
  • Latysheva Yuliya Nikolaevna
  • Ganin Egor Vladimirovich
RU2595496C2
METHOD OF DETECTING INSIGNIFICANT LEXICAL ITEMS IN TEXT MESSAGES AND COMPUTER 2014
  • Ganin Egor Vladimirovich
  • Kholodkov Anton Igorevich
RU2580424C1
METHOD AND SYSTEM FOR REFORMATTING ELECTRONIC MESSAGE BASED ON CATEGORY THEREOF 2014
  • Shmarovoz Georgij Valentinovich
  • Kozlov Aleksandr Viktorovich
  • Demyanenko Anna Aleksandrovna
  • Latysheva Yuliya Nikolaevna
  • Ganin Egor Vladimirovich
RU2595618C2
METHOD AND SYSTEM FOR REFORMATTING ELECTRONIC MESSAGE BASED ON CATEGORY THEREOF 2014
  • Shmarovoz Georgij Valentinovich
  • Kozlov Aleksandr Viktorovich
  • Demyanenko Anna Aleksandrovna
  • Latysheva Yuliya Nikolaevna
  • Ganin Egor Vladimirovich
RU2595619C2
METHOD AND SYSTEM FOR CREATING LIST OF ELECTRONIC MESSAGES 2014
  • Shmarovoz Georgij Valentinovich
  • Kozlov Aleksandr Viktorovich
  • Demyanenko Anna Aleksandrovna
  • Latysheva Yuliya Nikolaevna
  • Ganin Egor Vladimirovich
RU2595617C2
TEXT SEGMENTATION METHODS AND SYSTEMS 2003
  • Vejssman Adam Dzh.
RU2348071C2

RU 2 632 408 C2

Authors

Toma Adrian

Tibejka Marius Nikolae

Dates

2017-10-04Published

2014-02-04Filed