METHOD FOR AUTOMATED LANGUAGE DETECTION AND (OR) TEXT DOCUMENT CODING Russian patent published in 2013 - IPC G06F17/00 

Abstract RU 2500024 C2

FIELD: information technologies.

SUBSTANCE: in the method of automated language detection and (or) text document coding, byte sequences are identified, and statistics of frequency of identified byte sequences is counted. Then, using the statistics, profiles of each language and (or) each coding are built, a search engine is built to extract sought-for byte sequences from the byte flow of the inspected document, and the built search engine and profiles of languages and (or) codes are saved into the memory. Byte sequences are found in electronic version of each inspected document with the help of the search engine, and statistics of frequency of found byte sequences is counted as the profile of the inspected document. The calculated profile of the inspected document is compared with profiles of languages and (or) codes to identify relevance of the language and (or) code to this inspected document.

EFFECT: expanded arsenal of technical facilities, making it possible to automatically detect language and coding of text according to previously collected statistics in any text documents.

3 cl

Similar patents RU2500024C2

Title Year Author Number
METHOD OF CLASSIFYING DOCUMENTS BY CATEGORIES 2012
  • Lapshin Vladimir Anatol'Evich
  • Pshekhotskaja Ekaterina Aleksandrovna
  • Perov Dmitrij Vsevolodovich
RU2491622C1
METHOD AND SYSTEM OF SEMANTIC PROCESSING TEXT DOCUMENTS 2016
  • Mitelkov Dmitrij Vladimirovich
  • Novikov Andrej Yurevich
  • Satin Boris Borisovich
RU2630427C2
METHOD OF CONSTRUCTING SEMANTIC MODEL OF DOCUMENT 2011
  • Turdakov Denis Jur'Evich
  • Nedumov Jaroslav Rostislavovich
  • Sysoev Andrej Anatol'Evich
RU2487403C1
METHOD FOR AUTOMATED IDENTIFICATION OF LANGUAGE OR LINGUISTIC GROUP OF TEXT 2015
  • Kalegin Sergej Nikolaevich
RU2607989C1
METHOD FOR AUTOMATIC SEMANTIC INDEXING OF NATURAL LANGUAGE TEXT 2012
  • Kharlamov Aleksandr Aleksandrovich
RU2518946C1
CONTEXT-BASED METHOD OF ASSESSING MANIFESTATION DEGREE OF NOTION IN TEXT FOR SEARCH SYSTEMS 2007
  • Zlygostev Aleksej Sergeevich
RU2348072C1
METHOD FOR AUTOMATIC TEXT PROCESSING IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION, METHOD FOR AUTOMATIC PROCESSING COLLECTION OF TEXTS IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION AND COMPUTER READABLE MEDIA 2008
  • Khoroshevskij Vladimir Fedorovich
  • Klintsov Viktor Petrovich
RU2399959C2
METHOD FOR AUTOMATED ANALYSIS OF TEXT DOCUMENTS 2011
  • Lapshin Vladimir Anatol'Evich
  • Pshekhotskaja Ekaterina Aleksandrovna
  • Perov Dmitrij Vsevolodovich
RU2474870C1
METHOD TO DETECT TEXT OBJECTS 2012
  • Lapshin Vladimir Anatol'Evich
  • Pshekhotskaja Ekaterina Aleksandrovna
  • Perov Dmitrij Vsevolodovich
RU2498401C2
SYSTEM AND METHOD FOR SEMANTIC SEARCH 2013
  • Zuev Konstantin Alekseevich
  • Daniehljan Tat'Jana Vladimirovna
  • Rakhmatulina Ehl'Mira Monirovna
RU2563148C2

RU 2 500 024 C2

Authors

Lapshin Vladimir Anatol'Evich

Pshekhotskaja Ekaterina Aleksandrovna

Perov Dmitrij Vsevolodovich

Dates

2013-11-27Published

2011-12-27Filed