FIELD: data processing; biotechnology.
SUBSTANCE: group of inventions relates to a process of compressing genome sequence data obtained by a sequencer based on a reference sequence. For sequences of nucleotides or bases, which were previously aligned with a reference sequence, it is determined how accurately they are compared, inaccurately compared or not compared with a reference sequence; after which they are encoded depending on said definition. For each inaccurately matched sequence, the determining step includes comparing the number of discrepancies between said sequence and the reference sequence taking into account the reference threshold value and depending on the result of said comparison, encoding the inaccurately matched sequences using uniquely defined encoding processes of said method of compressing genome sequence data obtained by the sequencer. Group of inventions includes a computer-implemented method of compressing genome sequence data, a system for compressing genome sequence data, a machine-readable data storage device for compressing genome sequence data and a hardware processor for compressing genome sequence data.
EFFECT: group of inventions provides fast compression and decompression of data, while eliminating loss of information, and provides a high compression ratio.
32 cl, 7 dwg
Title | Year | Author | Number |
---|---|---|---|
METHOD FOR COMPRESSING GENOME SEQUENCE DATA | 2020 |
|
RU2807474C1 |
FAST DETECTION OF GENE FUSIONS | 2020 |
|
RU2818363C1 |
SEQUENCE GRAPH-BASED TOOL FOR DETERMINING VARIATION IN SHORT TANDEM REPEAT AREAS | 2020 |
|
RU2799654C2 |
SEQUENCE GRAPH TOOL FOR DETERMINING VARIATIONS IN REGIONS OF SHORT TANDEM REPEATS | 2020 |
|
RU2825664C2 |
FLEXIBLE SEED EXTENSION FOR HASH TABLE-BASED GENOMIC MAPPING | 2020 |
|
RU2796915C1 |
METHOD FOR DETERMINING INDICATOR CORRELATED WITH PROBABILITY THAT TWO MUTATED SEQUENCE READINGS ARE FROM THE SAME SEQUENCE CONTAINING MUTATION | 2020 |
|
RU2799778C1 |
SECURE GENOMIC DATA TRANSMISSION | 2015 |
|
RU2753245C2 |
SYSTEM AND METHOD FOR SECONDARY ANALYSIS OF NUCLEOTIDE SEQUENCING DATA | 2017 |
|
RU2741807C2 |
DETECTION OF SOMATIC VARIATION OF NUMBER OF COPIES | 2017 |
|
RU2768718C2 |
TERMINAL AND METHOD OF COMMUNICATION | 2020 |
|
RU2802817C1 |
Authors
Dates
2024-03-22—Published
2020-09-11—Filed