EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS CONTAINING TEXT IN NATURAL LANGUAGE Russian patent published in 2017 - IPC G06F17/20 G06F17/27 G06F17/28 

Abstract RU 2607976 C1

FIELD: data processing.

SUBSTANCE: invention relates to a method, a computer-readable data medium and a system for extracting data from a structured document. Method involves receiving by a computing device a table containing a text in a natural language, identifying the table header and multiple cells forming rows and columns, performing semantic-syntactic analysis of the natural language text to obtain multiple semantic structures, interpreting the multiple semantic structures using the first set of production rules for obtaining a data object represented by the table, where the production rules of this set include logic expressions defined at structural templates, performing analysis of the table header for determining multiple ontology-based classes associated with corresponding columns of the table, and modifying the data object represented by the table using the second set of production rules, where the production rules of this set are connected with the ontology-based classes associated with columns of the said table.

EFFECT: technical result is higher accuracy of forming an object of a structured document due to additional analysis of the table and modification of the formed data object represented by the table basing on this analysis.

18 cl, 19 dwg

Similar patents RU2607976C1

Title Year Author Number
METHOD AND SYSTEM FOR MACHINE EXTRACTION AND INTERPRETATION OF TEXT INFORMATION 2015
  • Starostin Anatoly Sergeevich
  • Smurov Ivan Mikhailovich
  • Stepanova Maria Evgenyevna
RU2592396C1
METHOD AND SYSTEM FOR STORING AND SEARCHING INFORMATION EXTRACTED FROM TEXT DOCUMENTS 2015
  • Matskevich Stepan Evgenievich
RU2605077C2
SYSTEM AND METHOD OF CREATING AND USING USER ONTOLOGY-BASED PATTERNS FOR PROCESSING USER TEXT IN NATURAL LANGUAGE 2015
  • Bulgakov Ilia Aleksandrovich
  • Yakovlev Egor Nikolaevich
  • Starostin Anatoly Sergeevich
RU2596599C2
METHOD OF EXTRACTING FACTS FROM TEXTS ON NATURAL LANGUAGE 2016
  • Starostin Anatolij Sergeevich
  • Smurov Ivan Mikhajlovich
  • Dzhumaev Stanislav Sergeevich
RU2637992C1
EXTRACTION OF INFORMATION FROM SANITARY BLOCKS OF DOCUMENTS USING MICROMODELS ON BASIS OF ONTOLOGY 2017
  • Danielyan Tatyana Vladimirovna
  • Mikhajlov Maksim Borisovich
RU2662688C1
SYSTEM AND METHOD OF CREATING AND USING USER SEMANTIC DICTIONARIES FOR PROCESSING USER TEXT IN NATURAL LANGUAGE 2015
  • Yakovlev Egor Nikolaevich
  • Starostin Anatoly Sergeevich
RU2584457C1
USING VERIFIED BY USER DATA FOR TRAINING MODELS OF CONFIDENCE 2016
  • Matskevich Stepan Evgenevich
  • Belov Andrej Aleksandrovich
RU2646380C1
RECOVERY OF TEXT ANNOTATIONS RELATED TO INFORMATION OBJECTS 2017
  • Bulgakov Ilya Aleksandrovich
  • Indenbom Evgenij Mikhajlovich
RU2665261C1
DEFINITION OF CONFIDENCE DEGREES RELATED TO ATTRIBUTE VALUES OF INFORMATION OBJECTS 2016
  • Belov Andrej Aleksandrovich
  • Matskevich Stepan Evgenevich
RU2640297C2
METHOD AND SYSTEM FOR TEXT SYNTHESIS BASED ON INFORMATION EXTRACTED AS RDF-GRAPH USING TEMPLATES 2015
  • Starostin Anatoly Sergeevich
  • Kuklin Dmitrii Alekseevich
RU2610241C2

RU 2 607 976 C1

Authors

Danielyan Tatiana Vladimirovna

Bulgakov Ilya Aleksandrovich

Dates

2017-01-11Published

2015-08-19Filed