FIELD: biotechnology.
SUBSTANCE: computer-implemented method for prediction of the likelihood of splicing sites in pre-mRNA genomic sequences is described. The method includes: obtaining pre-mRNA genomic sequences by sequencing pre-mRNA transcripts, and training a sparse convolutional neural network (hereinafter – ACNN) based on training examples of pre-mRNA nucleotide sequences, including at least 50,000 training examples of donor splicing sites, at least 50,000 training examples of acceptor splicing sites, and at least 100,000 training examples of sites not related to splicing, and the trained ACNN generates triple assessments to assess the likelihood of that each nucleotide in target nucleotides is a donor splicing site, an acceptor splicing site, or a site not related to splicing. In this case, the specified training includes: input of training examples of nucleotide sequences, encoded by encoding with one active state, wherein each nucleotide sequence contains at least 401 nucleotides, with at least one target nucleotide and a context of at least 200 flanking nucleotides on each side, in the 5’ direction and in the 3’ direction from the target nucleotide; and correction, by reverse distribution, of filter parameters in the specified ACNN to predict the likelihood assessments of that each target nucleotide in the specified nucleotide sequence is a donor splicing site, an acceptor splicing site, or a site not related to splicing; wherein the trained ACNN receives as an input a pre-mRNA nucleotide sequence of at least 401 nucleotides, which is encoded by encoding with one active state, and which includes at least one target nucleotide and a context of at least 200 flanking nucleotides on each side. A system for prediction of the likelihood of splicing sites in pre-mRNA genomic sequences is also described, including one or more processors related to memory, wherein computer commands are loaded into memory, which, when executed on the specified processors, implement actions including: training a sparse convolutional neural network (ACNN) based on training examples of pre-mRNA nucleotide sequences, including at least 50,000 training examples of donor splicing sites, at least 50,000 training examples of acceptor splicing sites, and at least 100,000 training examples of sites not related to splicing, and the trained ACNN generates triple assessments to assess the likelihood of that each nucleotide in target nucleotides is a donor splicing site, an acceptor splicing site, or a site not related to splicing.
EFFECT: invention expands the range of tools for learning (training) deep convolutional neural networks.
41 cl, 59 dwg, 3 tbl
Title | Year | Author | Number |
---|---|---|---|
METHODS FOR TRAINING DEEP CONVOLUTIONAL NEURAL NETWORKS BASED ON DEEP LEARNING | 2018 |
|
RU2767337C2 |
A DEEP LEARNING FRAME FOR IDENTIFYING SEQUENCE PATTERNS THAT CAUSE SEQUENCE SPECIFIC ERRORS (SSE) | 2019 |
|
RU2745733C1 |
GENOMIC INFRASTRUCTURE FOR LOCAL AND CLOUD PROCESSING AND ANALYSIS OF DNA AND RNA | 2017 |
|
RU2761066C2 |
GENOMIC INFRASTRUCTURE FOR LOCAL AND CLOUD PROCESSING AND ANALYSIS OF DNA AND RNA | 2017 |
|
RU2804029C2 |
METHOD FOR QUANTIFYING THE STATISTICAL ANALYSIS OF ALTERNATIVE SPLICING IN RNA-SEC DATA | 2020 |
|
RU2752663C1 |
COMPUTER-IMPLEMENTED INTEGRAL METHOD FOR ASSESSING QUALITY OF TARGET SEQUENCING RESULTS | 2018 |
|
RU2717809C1 |
ANTISENSE OLIGONUCLEOTIDE DIRECTED REMOVAL OF PROTEOLYTIC CLEAVAGE SITES, HCHWA-D MUTATIONS AND INCREASED NUMBER OF TRINUCLEOTIDE REPEATS | 2014 |
|
RU2692634C2 |
CALCULATION OF THE BURDEN OF TUMOUR MUTATIONS USING TUMOUR RNA SEQUENCING DATA AND CONTROLLED MACHINE LEARNING | 2020 |
|
RU2759205C1 |
GENE EDITING OF DEEP INTRON MUTATIONS | 2016 |
|
RU2759335C2 |
EXON SKIP WITH PEPTIDONUCLEIC ACID DERIVATIVES | 2017 |
|
RU2786637C2 |
Authors
Dates
2022-09-23—Published
2018-10-15—Filed