FIELD: computing.
SUBSTANCE: invention relates to a method for extracting information from unstructured texts written in a natural language. In the method, a set of texts is tokenised into sentences, words and word sequences, rare words are deleted, words are brought to the initial form without typos, according to the words in the initial form, a selected plurality of words of certain parts of speech is selected, used in the description of the target information, the presence of the target information is determined in word sequences containing all words from the selected plurality, the presence of the target information is determined for all text documents containing marked word sequences, the amount of text sources, the word occurrence threshold, and the set of parts of speech are optimised to achieve a set quality of information extraction.
EFFECT: increased quality of information extraction from text data sources.
3 cl, 4 dwg, 1 tbl
Title | Year | Author | Number |
---|---|---|---|
WAY TO DEFINE AND CLASSIFY A CONCEPT BASED ON THE CONTEXT OF ITS USE | 2022 |
|
RU2795870C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2804747C1 |
METHOD AND SYSTEM FOR DEPERSONALIZATION OF CONFIDENTIAL DATA | 2022 |
|
RU2802549C1 |
METHOD AND SYSTEM FOR EXTRACTING NAMED ENTITIES | 2021 |
|
RU2823914C2 |
METHOD OF GENERATING AND USING RECURSIVE INDEX OF SEARCH ENGINES | 2011 |
|
RU2459242C1 |
METHOD AND SYSTEM FOR DETERMINING ACTIVITY OF ACCOUNTS IN COMPUTING ENVIRONMENT | 2023 |
|
RU2824919C1 |
EXPANDING OF INFORMATION SEARCH POSSIBILITY | 2015 |
|
RU2618375C2 |
SYSTEM FOR CREATING DOCUMENTS BASED ON TEXT ANALYSIS ON NATURAL LANGUAGE | 2016 |
|
RU2639655C1 |
METHOD FOR PREDICTING SPEECH IMPAIRMENTS DURING NEUROSURGICAL INTERVENTIONS ACCORDING TO INTRAOPERATIVE REGISTRATION OF CORTICOCORTICAL EVOKED POTENTIALS | 2022 |
|
RU2806013C1 |
METHOD AND SYSTEM FOR DIGITAL ASSISTANT TEXT GENERATION | 2022 |
|
RU2796208C1 |
Authors
Dates
2021-07-21—Published
2020-09-09—Filed