FIELD: biotechnology; medicine.
SUBSTANCE: described is a method for sequencing genomic loci comprising two or more repeat sequences. Method is realized using a computer equipped with one or more processors and system memory for genotyping one or more repeat sequences. Each of the sequences contains one or more sub-sequences of repeats. Method includes: obtaining a sequence graph, where the sequence graph has a graph data structure with vertices representing nucleotide sequences, and the directed edges connect the vertices, and wherein the sequence graph contains two or more proper simple cycles, wherein each proper simple cycle is a subsequence of repeats; alignment, using one or more processors, of reading sequences of the analyzed sample on a reference genome to determine genomic coordinates of reading sequences and selecting a subset of reads of sequences; and aligning, using one or more processors, a selected subset of reads of sequences with two or more sequences of repetitions, represented by a sequence graph representing a genomic locus. Also disclosed is a corresponding system for sequencing genomic loci, including two or more sequences of repeats.
EFFECT: invention enables to genotype sequences of repeats, including short tandem repeats (CTR), which are significant from a medical point of view.
10 cl, 7 dwg, 1 tbl
Title | Year | Author | Number |
---|---|---|---|
SEQUENCE GRAPH-BASED TOOL FOR DETERMINING VARIATION IN SHORT TANDEM REPEAT AREAS | 2020 |
|
RU2799654C2 |
SET OF PROBES FOR ANALYZING DNA SAMPLES AND METHODS FOR THEIR USE | 2016 |
|
RU2753883C2 |
WHOLE GENOME SEQUENCING DATA PROCESSING SYSTEM | 2023 |
|
RU2806429C1 |
ANIMALS OTHER THAN HUMANS CHARACTERIZED IN EXPANSION OF HEXANUCLEOTIDE REPEATS AT LOCUS C9orf72 | 2017 |
|
RU2760877C2 |
SUPPRESSING ERRORS IN SEQUENCED DNA FRAGMENTS BY USING EXCESSIVE READING WITH UNIQUE MOLECULAR INDICES (UMI) | 2016 |
|
RU2704286C2 |
METHODS AND SYSTEMS FOR OBTAINING SETS OF UNIQUE MOLECULAR INDICES WITH HETEROGENEOUS LENGTH OF MOLECULES AND CORRECTING ERRORS THEREIN | 2018 |
|
RU2766198C2 |
A DEEP LEARNING FRAME FOR IDENTIFYING SEQUENCE PATTERNS THAT CAUSE SEQUENCE SPECIFIC ERRORS (SSE) | 2019 |
|
RU2745733C1 |
WHOLE GENOME SEQUENCING DATA PROCESSING SYSTEM | 2023 |
|
RU2804535C1 |
BIOINFORMATIC SYSTEMS, DEVICES AND METHODS FOR PERFORMING SECONDARY AND/OR TERTIARY PROCESSING | 2017 |
|
RU2750706C2 |
BIOINFORMATION SYSTEMS, DEVICES AND METHODS FOR SECONDARY AND/OR TERTIARY PROCESSING | 2017 |
|
RU2799750C2 |
Authors
Dates
2024-08-28—Published
2020-03-06—Filed