A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Hannigan, Geoffrey D.; Prihoda, David; Palicka, Andrej; Soukup, Jindrich; Klempir, Ondrej; Rampula, Lena; Durcak, Jindrich; Wurst, Michael; Kotowski, Jakub; Chang, Dan; Wang, Rurun; Piizzi, Grazia; Temesi, Gergely; Hazuda, Daria J.; Woelk, Christopher H.; Bitton, Danny A.

Published in

Oxford University Press, Nucleic Acids Research, 18(47), p. e110-e110, 2019

DOI: 10.1093/nar/gkz654

Tools

Export citation

Search in Google Scholar

A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Journal article published in 2019 by Geoffrey D. Hannigan, David Prihoda

, Andrej Palicka, Jindrich Soukup, Ondrej Klempir, Lena Rampula, Jindrich Durcak, Michael Wurst, Jakub Kotowski, Dan Chang, Rurun Wang, Grazia Piizzi, Gergely Temesi, Daria J. Hazuda, Christopher H. Woelk and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractNatural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.

Published in

Links

Tools

A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Abstract