FIELD: information technology.
SUBSTANCE: method for automatic semantic indexing of natural language text comprises segmenting the text into elementary first level units (words) and sentences; forming second level units (standardised word forms); calculating the frequency of occurrence of each first level unit for adjacent first level units and merging the sequence of words into third level units (stable word combinations); identifying in each sentence a semantically significant entity and an attribute thereof (fourth level units); identifying in each sentence semantically significant relationships between semantically significant entities and between semantically significant entities and attributes; determining the frequency of occurrence of second level and third level units; forming, for each semantically significant relationship, a plurality of triads (fifth level units); on the plurality of the formed triads, separately indexing all semantically significant entities linked by semantically significant relationships with their frequency of occurrence, all attributes with their frequency of occurrence and all formed triads.
EFFECT: high accuracy of indexing natural language texts.
6 cl, 2 dwg, 23 tbl
Authors
Dates
2014-06-10—Published
2012-11-27—Filed