METHOD OF MARKING AND VERIFYING TEXT DATA Russian patent published in 2025 - IPC G06F18/00 G06F40/10 

Abstract RU 2832840 C1

FIELD: data processing.

SUBSTANCE: invention relates to a method for marking and verifying text data. In the method, at the first stage, the deep learning language model is preliminary trained on the prepared data corpus, which includes collections of texts of a wide thematic focus, at the second stage, the text data relevant to the problem being solved are marked using the program interface by selecting fragments of text of an arbitrary length, assigning the marked data to various user-defined categories, which are used as an additional training sample for the language model, at the third stage, preliminary processing of marked data is performed, at the fourth stage, the language model is trained based on the newly marked data and the marked data is vectorized, at the fifth step, categories are predicted on a plurality of unlabelled data using a classifier model coupled with a language model, wherein metrics are generated, reflecting the degree of uncertainty of the model for each category, a strategy for selecting objects from the sample is used, a degree of information content is assigned to each object based on the metrics, after which the most informative objects are selected for the expert assessment, wherein the maximum entropy and the minimum confidence are used as the uncertainty metrics, as well as a "category duplication" metric, reflecting the degree of uncertainty when assigning data belonging to one category to another category, wherein the calculation of this metric is carried out by calculating the average confidence of the model for one type of categories by marking for the other type of categories, after which the sequence of actions from the second to the fifth steps is repeated until a consensus is reached between the assessment of the expert and the uncertainty metrics for all objects provided for assessment and their predicted categories, wherein the choice of the moment of consensus is determined by the expert.

EFFECT: possibility of more accurate marking of a text document.

3 cl, 1 dwg

Similar patents RU2832840C1

Title Year Author Number
AUTOMATED LEGAL ADVICE SYSTEM CONTROL METHOD 2019
  • Prikhodko Olga Viktorovna
  • Khyurri Ruslan Vladimirovich
  • Prikhodko Olga Viktorovna
RU2718978C1
ALLOCATION OF TIME EXPRESSIONS FOR TEXTS IN NATURAL LANGUAGE 2014
  • Romanenko Aleksandr Aleksandrovich
RU2595489C2
SYSTEM FOR AUTOMATIC DETERMINATION OF SUBJECT MATTER OF TEXT DOCUMENTS BASED ON EXPLICABLE ARTIFICIAL INTELLIGENCE METHODS 2023
  • Sochenkov Ilia Vladimirovich
  • Zhebel Vladimir Viktorovich
  • Zubarev Denis Vladimirovich
  • Deviatkin Dmitrii Alekseevich
  • Iadrintsev Vasilii Vladimirovich
RU2823436C1
NAMED ENTITIES FROM THE TEXT AUTOMATIC EXTRACTION 2014
  • Nekhaj Ilya Vladimirovich
RU2665239C2
METHOD FOR ATTRIBUTION OF PARTIALLY STRUCTURED TEXTS FOR FORMATION OF NORMATIVE-REFERENCE INFORMATION 2020
  • Fedosin Sergei Alekseevich
  • Plotnikova Natalia Pavlovna
  • Martynov Vladislav Aleksandrovich
  • Ryskin Konstantin Eduardovich
  • Kuznetsov Dmitrii Aleksandrovich
  • Deniskin Aleksandr Vladimirovich
  • Vechkanova Iuliia Sergeevna
  • Fediushkin Nikolai Alekseevich
  • Tsilikov Nikita Sergeevich
RU2750852C1
METHOD FOR CONTROLLING A DIALOGUE AND NATURAL LANGUAGE RECOGNITION SYSTEM IN A PLATFORM OF VIRTUAL ASSISTANTS 2020
  • Ashmanov Stanislav Igorevich
  • Sukhachev Pavel Sergeevich
  • Zorkij Fedor Kirillovich
RU2759090C1
TEXT SEGMENTATION 2017
  • Indenbom Evgenij Mikhajlovich
  • Kolotienko Sergej Sergeevich
RU2666277C1
TRAINING CLASSIFIERS USED TO EXTRACT INFORMATION FROM NATURAL LANGUAGE TEXTS 2018
  • Matskevich Stepan Evgenevich
  • Bulgakov Ilya Aleksandrovich
RU2691855C1
CLASSIFIER TRAINING USED FOR EXTRACTING INFORMATION FROM TEXTS IN NATURAL LANGUAGE 2018
  • Matskevich Stepan Evgenevich
  • Bulgakov Ilya Aleksandrovich
RU2681356C1
METHOD FOR OBTAINING LOW-DIMENSIONAL NUMERIC REPRESENTATIONS OF SEQUENCES OF EVENTS 2020
  • Babaev Dmitrij Leonidovich
  • Ovsov Nikita Pavlovich
  • Kireev Ivan Aleksandrovich
RU2741742C1

RU 2 832 840 C1

Authors

Pantin Aleksej Ivanovich

Korobejnikov Aleksej Andreevich

Dates

2025-01-09Published

2023-12-26Filed