Представление мутаций в виде графа
There are multiple discussions that a flat VCF file format is not optimal for storing genetic variation information. The alternative solution is to use Graph databases for it. As an example a Global Alliance for Genomics and Health which includes 217 members choose to use Graph DB. Multiple discussions and prototypes indicated at least following benefits for Graph based representation of mutations: 1. Possibility to use read mappers which will take in account all known mutations and gaplotypes; 2. Significant speed up for annotation of new VCF files; 3. Possibility to make searches which is highly computationally intensive for other storages - like a finding all VCF mutations with the specific mutations set; and many other.
In this project students will use a Hadoop optimized graph DB for storing and operating with variation information. The work is modular where each module can be developed by separated student.