FIELD: information technologies.
SUBSTANCE: in the method of automated language detection and (or) text document coding, byte sequences are identified, and statistics of frequency of identified byte sequences is counted. Then, using the statistics, profiles of each language and (or) each coding are built, a search engine is built to extract sought-for byte sequences from the byte flow of the inspected document, and the built search engine and profiles of languages and (or) codes are saved into the memory. Byte sequences are found in electronic version of each inspected document with the help of the search engine, and statistics of frequency of found byte sequences is counted as the profile of the inspected document. The calculated profile of the inspected document is compared with profiles of languages and (or) codes to identify relevance of the language and (or) code to this inspected document.
EFFECT: expanded arsenal of technical facilities, making it possible to automatically detect language and coding of text according to previously collected statistics in any text documents.
3 cl
Title | Year | Author | Number |
---|---|---|---|
METHOD OF CLASSIFYING DOCUMENTS BY CATEGORIES | 2012 |
|
RU2491622C1 |
METHOD AND SYSTEM OF SEMANTIC PROCESSING TEXT DOCUMENTS | 2016 |
|
RU2630427C2 |
METHOD OF CONSTRUCTING SEMANTIC MODEL OF DOCUMENT | 2011 |
|
RU2487403C1 |
METHOD FOR AUTOMATED IDENTIFICATION OF LANGUAGE OR LINGUISTIC GROUP OF TEXT | 2015 |
|
RU2607989C1 |
METHOD FOR AUTOMATIC SEMANTIC INDEXING OF NATURAL LANGUAGE TEXT | 2012 |
|
RU2518946C1 |
CONTEXT-BASED METHOD OF ASSESSING MANIFESTATION DEGREE OF NOTION IN TEXT FOR SEARCH SYSTEMS | 2007 |
|
RU2348072C1 |
METHOD FOR AUTOMATIC TEXT PROCESSING IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION, METHOD FOR AUTOMATIC PROCESSING COLLECTION OF TEXTS IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION AND COMPUTER READABLE MEDIA | 2008 |
|
RU2399959C2 |
METHOD FOR AUTOMATED ANALYSIS OF TEXT DOCUMENTS | 2011 |
|
RU2474870C1 |
METHOD TO DETECT TEXT OBJECTS | 2012 |
|
RU2498401C2 |
SYSTEM AND METHOD FOR SEMANTIC SEARCH | 2013 |
|
RU2563148C2 |
Authors
Dates
2013-11-27—Published
2011-12-27—Filed