Finding structural motifs in Natural Products
Natural Products (NPs) have an unparalleled track record in pharmacology: most anticancer and antimicrobial agents are natural products or their derivatives. NPs are usually categorized into multiple classes based on their biosynthetic origin (Peptidic NP PNP, Polyketides, Alkaloids, etc). The amide bond is a wellknown structural motif for PNPs. In order to better understand different classes of NPs, motifs in these classes should be found as well. This information can be further applied for automatic database annotation and for solving other problems like PNP dereplication.
We have found structural motifs for three most interesting classes of NPs: NonRibosomal Peptides (NRPs), Ribosomally Synthesized and Posttranslationally Modified Peptides (RiPPs), and Polyketides. These motifs are well represented in one class and rarely occur in other classes. In addition, our research showed that nitrogens are less common in Polyketides than in two other classes. We have developed a tool using machine learning technique which can separate different classes of Natural Products based on structural motifs and other features. The tool was validated on a mix of NRP, RiPP and Polyketide structures and showed decent results with f1score over 93% for all three classes.