FIELD: biotechnology.
SUBSTANCE: described is a method of identifying one or more gene fusions in a biological sample. Method includes operations for obtaining first data, which represent a plurality of aligned readings, identifying a plurality of potential merges included in the obtained first data, filtering a plurality of potential fusions to determine a filtered set of potential merges, for each specific potential fusion from the filtered set of fusion candidates: generating input data by one or more computers for input into a machine learning model, which include extracted feature data representing a particular potential fusion, transmission of generated input data as input data to machine learning model, which has been trained to generate output reflecting the probability that the potential fusion is a confirmed gene fusion, and determining whether the particular potential fusion corresponds to the confirmed gene fusion based on the output. Also disclosed is a system for identifying one or more gene fusions in a biological sample, comprising one or more computers and one or more storage devices on which instructions are stored, when executed by one or more computers, one or more computers perform operations involving the described method. In addition, provided is a non-volatile machine-readable medium for identifying one or more gene fusions in a biological sample, on which software containing instructions is stored, made with possibility of execution by one or more computers, due to execution of which one or more computers perform operations, including described method.
EFFECT: fast detection of gene fusions.
13 cl, 4 dwg
Authors
Dates
2024-05-02—Published
2020-12-04—Filed