Pipeline for target sequencing analysis

Next-Generation Sequencing(NGS) is used worldwide in many different applications, such as scientific investigations, medical analyses and forensic expertises. Volume of data generated by NGS is increasing daily and its manual processing is becoming impossible. To cope with this problem workflow automatization is performed.

In this project a pipeline for automatic analysis of target sequencing Ion Torrent data was developed. Its main logic is written in Common Workflow Language. It is a specification for describing analysis workflows to make them portable and scalable across a variety of software and hardware environments such as Azure, Slurm, GA4GH TES, Linux. All tools used in workflow are wrapped into Docker containers to make them independent of environment. Beside standard steps designed pipeline includes alignment of reads to amplicons sequences followed by primer trimming and quality assessment. Regions with low quality are included into final Variant Call Format file metadata. Tools for primer trimming and vcf modification are written in Python.

Pipeline was tested on 22 samples from Human Genome project and showed good results. Specificity -- 98,00% and sensitivity -- 99,54% were greater or equal to those (98,00%, 99,26%) of standard Torrent Suite pipeline. Moreover, designed workflow allows identification of variations that lie on amplicons intersection, due to the step of primer trimming.

Devised pipeline is platform independent, doesn’t has problems with intersected amplicons, performs quality assessment which results is incorporated into final report and doesn’t concede in accuracy to the standard Ion Torrent pipeline. Thus it can be used for processing of target sequencing data from Ion Torrent.

 

Куратор:
Время выполнения проекта: Feb 2017 — Jun 2017