Hardware-effective Tool for Genome Mappability Score Estimation

Our project is devoted to a development of an effective and accurate tool for a genome mappability evaluation. Mappability is a genome-wide function showing if a region in a genome could be identified unambiguously using a read of a particular length. This function typically ranges between 0 and 100. If exact genome subsequence may be found in more than one location, then the mappability in that location of the genome is set to zero. Otherwise, if the subsequence is unique, mappabity is close to 100.

Fast computation of a mappability track (MT) is a complex combinatorial task. The state-of-art tools are either exhibit hardware-greedy behavior (such as GMA) or aren’t supported anymore (e.g. GEM-tool). Our solution aims to provide user-friendly and fast yet accurate instrument for an MT calculation. It is based on a fast sorting of all genome subsequences of a given length. The sorted list provides an information about subsequences uniqueness, which allows to assign to each of them score 1 (unique) or 0 (not unique). A consecutive application of an FFT-based convolution with a windowing function restores MT. Multicore computation is also supported. As an additional option an MT calculation for paired-end reads is implemented. A preliminary accuracy estimation shows that the results of the developed algorithm differ by no more than 10% from the results of high-precision but significantly more resource-demanding mapping-based tools.

The developed approach may be further used for an accuracy improvement of different bioinformatics tools which use results of a genomic alignment, such as ChIP-seq peak callers and variant callers of genome resequencing tools.

 

Куратор:
   Александр Предеус
Время выполнения проекта: Feb 2017 — Jun 2017