Представление мутаций в виде графа

There are multiple discussions that a flat VCF file format is not optimal for storing genetic variation information. The alternative solution is to use Graph databases for it. As an example a Global Alliance for Genomics and Health which includes 217 members choose to use Graph DB. Multiple discussions and prototypes indicated at least following benefits for Graph based representation of mutations: 1. Possibility to use read mappers which will take in account all known mutations and gaplotypes; 2. Significant speed up for annotation of new VCF files; 3. Possibility to make searches which is highly computationally intensive for other storages - like a finding all VCF mutations with the specific mutations set; and many other.

In this project students will use a Hadoop optimized graph DB for storing and operating with variation information. The work is modular where each module can be developed by separated student.

   Андрей Ершов
   Максим Михеев
Время выполнения проекта: Sep 2014 — Dec 2014