FIELD: information technology.
SUBSTANCE: invention relates to method of extracting time expressions in natural language texts. Method involves dividing a text into two non-overlapping subsets: non-marked text data for testing and non-marked text data for training; marking untagged text data for testing to obtain “golden” plurality; creating a list of regular expressions and a mechanism for marking text data by means of a list of regular expressions; and marking non-marked text data for training to produce a grammatically marked text and partially marked time expressions; providing training of machine learning algorithm using marked text data, marking non-marked text data for testing, by means of the machine learning algorithm.
EFFECT: technical result consists in enabling and using marking of non-marked text data in machine learning algorithm for marking time expressions in natural language text.
7 cl, 2 dwg
Title | Year | Author | Number |
---|---|---|---|
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION | 2014 |
|
RU2665239C2 |
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION | 2020 |
|
RU2750852C1 |
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD | 2019 |
|
RU2718978C1 |
TEXT SEGMENTATION | 2017 |
|
RU2666277C1 |
METHOD AND SYSTEM FOR CLASSIFYING DATA FOR IDENTIFYING CONFIDENTIAL INFORMATION IN THE TEXT | 2019 |
|
RU2755606C2 |
METHOD FOR PRELIMINARY PROCESSING OF TEXT | 2007 |
|
RU2386178C2 |
AI TRANSACTION ADMINISTRATION SYSTEM | 2020 |
|
RU2777958C2 |
METHOD FOR CONTROLLING A DIALOGUE AND NATURAL LANGUAGE RECOGNITION SYSTEM IN A PLATFORM OF VIRTUAL ASSISTANTS | 2020 |
|
RU2759090C1 |
METHOD FOR AUTOMATED EXTRACTION OF SEMANTIC COMPONENTS FROM COMPOUND SENTENCES OF NATURAL-LANGUAGE TEXTS IN MACHINE TRANSLATION SYSTEMS AND APPARATUS FOR IMPLEMENTATION THEREOF | 2021 |
|
RU2777693C1 |
METHOD FOR SEPARATING TEXTS AND ILLUSTRATIONS IN IMAGES OF DOCUMENTS USING A DESCRIPTOR OF DOCUMENT SPECTRUM AND TWO-LEVEL CLUSTERING | 2017 |
|
RU2656708C1 |
Authors
Dates
2016-08-27—Published
2014-06-18—Filed