FIELD: bioinformatics.
SUBSTANCE: following is described: a method of creating an annotated reference genome based on a graph including obtaining, by sequencing, one or more versions of a reference genome that are earlier versions of the current reference genome, wherein each of these one or more versions of the said reference genome contains a plurality of nodes, at least some of which contain information identifying a specified version of the reference genome and the location in that version of the specified reference genome for the corresponding node. Aligning each of the one or more derived earlier versions of the said reference genome with the said current reference genome to create a graph-based reference genome, wherein the alignment is based at least in part on localization information from the nodes of the derived earlier version of the said reference genome. Retrieving from a corpus of at least multiple sources, each containing information about an allele and contextual information associated with that allele, an allele and contextual information associated with the said allele, wherein the corresponding source identifies: (i) one of one or more obtained earlier versions of the said reference genome and (ii) the location of the said allele in an identified earlier version of said reference genome; and mapping of the said retrieved allele and associated contextual information to a node of the said reference genome in a graph-based manner, based on the identified earlier version of the said reference genome and the location of the said retrieved allele within that identified earlier version of the said reference genome. Generating a report summarizing all specified contextual information associated with a node of the specified reference genome based on a graph; and providing the said generated report to the user through the user interface. A system for generating an annotated graph-based reference genome is also described.
EFFECT: invention makes it possible to collect and systematize literature concerning previous versions of the reference genome relative to the current version of the reference genome based on the graph.
11 cl, 3 dwg
Title | Year | Author | Number |
---|---|---|---|
SPLICING SITES CLASSIFICATION BASED ON DEEP LEARNING | 2018 |
|
RU2780442C2 |
METHODS FOR TRAINING DEEP CONVOLUTIONAL NEURAL NETWORKS BASED ON DEEP LEARNING | 2018 |
|
RU2767337C2 |
METHODS AND COMPOSITIONS FOR DETECTING SOMATIC VARIANT | 2019 |
|
RU2813655C2 |
BIOINFORMATION SYSTEMS, DEVICES AND METHODS FOR SECONDARY AND/OR TERTIARY PROCESSING | 2017 |
|
RU2799750C2 |
BIOINFORMATIC SYSTEMS, DEVICES AND METHODS FOR PERFORMING SECONDARY AND/OR TERTIARY PROCESSING | 2017 |
|
RU2750706C2 |
SYSTEM AND METHOD OF INTERPRETING DATA AND PROVIDING RECOMMENDATIONS TO USER BASED ON GENETIC DATA THEREOF AND DATA ON COMPOSITION OF INTESTINAL MICROBIOTA | 2017 |
|
RU2699284C2 |
GENOMIC INFRASTRUCTURE FOR LOCAL AND CLOUD PROCESSING AND ANALYSIS OF DNA AND RNA | 2017 |
|
RU2804029C2 |
GENOMIC INFRASTRUCTURE FOR LOCAL AND CLOUD PROCESSING AND ANALYSIS OF DNA AND RNA | 2017 |
|
RU2761066C2 |
CALCULATION OF THE BURDEN OF TUMOUR MUTATIONS USING TUMOUR RNA SEQUENCING DATA AND CONTROLLED MACHINE LEARNING | 2020 |
|
RU2759205C1 |
A DEEP LEARNING FRAME FOR IDENTIFYING SEQUENCE PATTERNS THAT CAUSE SEQUENCE SPECIFIC ERRORS (SSE) | 2019 |
|
RU2745733C1 |
Authors
Dates
2023-12-07—Published
2019-05-20—Filed