FIELD: data processing.
SUBSTANCE: invention relates to a method, a computer-readable data medium and a system for extracting data from a structured document. Method involves receiving by a computing device a table containing a text in a natural language, identifying the table header and multiple cells forming rows and columns, performing semantic-syntactic analysis of the natural language text to obtain multiple semantic structures, interpreting the multiple semantic structures using the first set of production rules for obtaining a data object represented by the table, where the production rules of this set include logic expressions defined at structural templates, performing analysis of the table header for determining multiple ontology-based classes associated with corresponding columns of the table, and modifying the data object represented by the table using the second set of production rules, where the production rules of this set are connected with the ontology-based classes associated with columns of the said table.
EFFECT: technical result is higher accuracy of forming an object of a structured document due to additional analysis of the table and modification of the formed data object represented by the table basing on this analysis.
18 cl, 19 dwg
Authors
Dates
2017-01-11—Published
2015-08-19—Filed