METHOD FOR TEXTUAL INFORMATION RECOGNITION AND ITS INTEGRITY EVALUATION IN INTERNET ELECTRONIC DOCUMENTS Russian patent published in 2015 - IPC G06F17/27 G06K9/36 

Abstract RU 2550543 C1

FIELD: information technology.

SUBSTANCE: method for textual information recognition and its integrity evaluation in Internet electronic documents an electronic document is split into areas presumptively containing text paragraphs and lines. Herewith, document splitting is performed up to obtaining the areas containing continuous logically bracketed text of the largest size. Redundant and surplus information it deleted. Symbol encoding validity is analysed by means of the analysis whether letters belong to the alphabet or not and whether text words belong to the vocabulary or not, taking into account the given language. Statistical characteristics of word classes and their forms are calculated. From the obtained values of statistical characteristics a working vocabulary attribute vector is generated, which is converted into the main components vector using componential analysis procedures and classified using preliminarily learned classifiers. Textual information integrity is evaluated based on a voting method of decision making.

EFFECT: higher productivity of an electronic documents contensive processing system and increase in the analysed data sources number.

5 dwg

Similar patents RU2550543C1

Title Year Author Number
METHOD OF DETERMINING PROFILE OF MOBILE DEVICE USER ON MOBILE DEVICE ITSELF AND DEMOGRAPHIC PROFILING SYSTEM 2016
  • Yoo Jaebong
  • Kryzhanovskiy Konstantin Alexandrovich
  • Podoynitsina Lyubov Vladimirovna
  • Romanenko Alexander Alexandrovich
  • Polubotko Dmitry Valerievich
  • Kazantsev Alexey Yurievich
  • Moiseenko Andrey Konstantinovich
  • Maslennikov Mstislav Vladimirovich
RU2647661C1
METHOD FOR ORDERING DATA SUBMITTED IN ALPHANUMERIC INFORMATION BLOCKS 2000
  • Pripachkin Ju.I.
  • Smentsarev G.V.
RU2210809C2
METHOD AND SYSTEM FOR CLASSIFYING AND FILTERING PROHIBITED CONTENT IN A NETWORK 2020
  • Prudkovskij Nikolaj Sergeevich
RU2738335C1
USE OF AUTOENCODERS FOR LEARNING TEXT CLASSIFIERS IN NATURAL LANGUAGE 2017
  • Anisimovich Konstantin Vladimirovich
  • Indenbom Evgenij Mikhajlovich
  • Ivashnev Ivan Ivanovich
RU2678716C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2804747C1
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA 2022
  • Babak Nikita Grigorevich
  • Belorybkin Leonid Yurevich
  • Terenin Aleksej Alekseevich
  • Shabrova Anastasiya Igorevna
RU2802549C1
METHOD AND SYSTEM FOR GENERATING AN OBJECT CARD 2018
  • Akulov Yaroslav Viktorovich
RU2739554C1
DEVICES AND METHODS, WHICH BUILD THE HIERARCHIALLY ORDINARY DATA STRUCTURE, CONTAINING NONPARAMETERIZED SYMBOLS FOR DOCUMENTS IMAGES CONVERSION TO ELECTRONIC DOCUMENTS 2013
  • Chulinin Yurij Georgievich
RU2625533C1
METHOD FOR AUTOMATIC CLASSIFICATION OF FORMALIZED ELECTRONIC GRAPHIC AND TEXT DOCUMENTS IN THE ELECTRONIC DOCUMENT CIRCULATION SYSTEM WITH AUTOMATIC FORMATION OF ELECTRONIC CASES 2020
  • Korolev Igor Dmitrievich
  • Filippov Maksim Yurevich
  • Nazintsev Vadim Sergeevich
RU2759887C1
METHODS AND DEVICES THAT CONVERT IMAGES OF DOCUMENTS TO ELECTRONIC DOCUMENTS USING TRIE-DATA STRUCTURES CONTAINING UNPARAMETERIZED SYMBOLS FOR DEFINITION OF WORD AND MORPHEMES ON DOCUMENT IMAGE 2013
  • Chulinin Yurij Georgievich
RU2631168C2

RU 2 550 543 C1

Authors

Molchanov Artem Nikolaevich

Skurnovich Aleksej Valentinovich

Stel'Makh Ehduard Petrovich

Molchanov Il'Ja Nikolaevich

Dates

2015-05-10Published

2013-12-11Filed