OPTIMISATION OF FACT EXTRACTION USING MULTI-STAGE APPROACH Russian patent published in 2012 - IPC G06F17/21 G06F17/30 

Abstract RU 2451999 C2

FIELD: information technology.

SUBSTANCE: facts are extracted from electronic documents by recognising factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighbourhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.

EFFECT: more accurate search results.

20 cl, 6 dwg

Similar patents RU2451999C2

Title Year Author Number
METHOD FOR SYNTHESIS OF SELF-TEACHING SYSTEM FOR EXTRACTING KNOWLEDGE FROM TEXT DOCUMENTS FOR SEARCH ENGINES 2002
  • Nasypnyj Vladimir Vladimirovich
  • Nasypnaja Galina Anatol'Evna
RU2273879C2
SYSTEM AND METHOD FOR SEMANTIC SEARCH 2013
  • Zuev Konstantin Alekseevich
  • Daniehljan Tat'Jana Vladimirovna
  • Rakhmatulina Ehl'Mira Monirovna
RU2563148C2
METHOD OF CLUSTERING OF SEARCH RESULTS DEPENDING ON SEMANTICS 2014
  • Andreev Sergey Gennadievich
RU2564629C1
METHOD FOR AUTOMATED ANALYSIS OF TEXT AND SELECTION OF RELEVANT RECOMMENDATIONS TO IMPROVE READABILITY THEREOF 2021
  • Burov Anatolii Vladimirovich
  • Iliakhov Maksim Olegovich
RU2769427C1
EXPANDING OF INFORMATION SEARCH POSSIBILITY 2015
  • Danielyan Tatyana Vladimirovna
  • Indenbom Evgenij Mikhajlovich
RU2618375C2
METHOD FOR AUTOMATIC TEXT PROCESSING IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION, METHOD FOR AUTOMATIC PROCESSING COLLECTION OF TEXTS IN NATURAL LANGUAGE THROUGH SEMANTIC INDEXATION AND COMPUTER READABLE MEDIA 2008
  • Khoroshevskij Vladimir Fedorovich
  • Klintsov Viktor Petrovich
RU2399959C2
COMPREHENSIVE AUTOMATIC PROCESSING OF TEXT INFORMATION 2014
  • Danielyan Tatyana Vladimirovna
  • Starostin Anatolij Sergeevich
  • Zuev Konstantin Alekseevich
  • Anisimovich Konstantin Vladimirovich
  • Selegej Vladimir Pavlovich
RU2662699C2
METHOD FOR AUTOMATIC SEMANTIC INDEXING OF NATURAL LANGUAGE TEXT 2012
  • Kharlamov Aleksandr Aleksandrovich
RU2518946C1
METHOD AND SYSTEM OF SEMANTIC PROCESSING TEXT DOCUMENTS 2016
  • Mitelkov Dmitrij Vladimirovich
  • Novikov Andrej Yurevich
  • Satin Boris Borisovich
RU2630427C2
METHOD OF SEARCHING FOR INFORMATION IN TEXT ARRAY 2008
  • Tsilikov Il'Ja Sergeevich
RU2392660C2

RU 2 451 999 C2

Authors

Azzam Salikha

Khamfriz Kevin Uill'Jam

Dates

2012-05-27Published

2007-07-20Filed