Deep learning in protein anomalies

Convolutional neural networks (CNNs) are well known for their ability to leverage spatial and temporal structure. They can accurately infer protein functions or predict small molecules’ bioactivity in ligand-protein interactions. Nevertheless, several technical issues limit their wider adoption in structural biology. Here we address some of the most challenging of them.

Complex spatial structure is an intrinsic property of protein molecules. Yet, from the technical point of view voxelised 3D-representations of entire protein molecules, are extremely sparse and memory-intensive, which greatly limits possible CNN applications. We explored various dimensionality reduction and scaling methods to preserve as much spatial information as possible while dramatically reducing memory-input. We also tested several data-channel designs to optimise convolution efficiency and learning. Using these representations, we analyzed the encoded X-Ray protein models available at PDB. Based on these data we have selected an optimal representation strategy. We trained a convolutional autoencoder to demonstrate its effectiveness.

 

Студент:
   Иван Ревегук
Куратор:
   Андрей Афанасьев
Время выполнения проекта: Feb 2017 — Jun 2017
Файлы:
   reveguk_final_27052017.pdf