Applying cancer subpopulations analysis methods to metagenomic series

In many metagenomic studies, multiple similar metagenomic samples are available forming time- or spatial- series. In theory, such series provide unprecedented opportunities for decomposition of the mixtures of closely-related bacterial strains.

Arguably the basic problem one can formulate is: using a metagenomic series data and reference genome for particular species, identify the number of related strains and their relative abundances across the samples. Surprisingly, very few options exist to perform this kind of analysis. In 2015 Luo et al. developed ConStrains tool, which has a lot of shortcomings, in particular it uses questionable computational model and we did not succeed in reproducing results from the paper.

While the problem of strains detection has only recently been introduced in metagenomic series analysis, a closely related computational problem of detecting cancer subpopulations in a series of tumor samples has been extensively studied in the past five years. Multiple software tools have been developed around advanced statistical and/or algorithmic approaches (e.g. Clomial [Zare et al, 2014], PhyloWGS [Deshwar et al, 2015], Pyclone [Roth et al, 2013], etc.).

In this study we tried to apply those tools to the analysis of related strains in metagenomic series. We adjusted Clomial and PhyloWGS tools to metagenomic series data and tested them on simulated and real datasets. While we failed to achieve good results with PhyloWGS so far, we observed that Clomial could be successfully used in metagenomics setting, producing more accurate results compared to ConStrains.


   Сергей Нурк
Время выполнения проекта: Feb 2017 — Jun 2017