Published in

Oxford University Press, Nucleic Acids Research, D1(51), p. D603-D610, 2022

DOI: 10.1093/nar/gkac1049

Links

Tools

Export citation

Search in Google Scholar

MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters

Journal article published in 2022 by Jeffrey A. van Santen ORCID, Thomas Tørring ORCID, Liana Zaroubi ORCID, Justin J. J. van der Hooft ORCID, Daniel Udwary ORCID, Aruna Vigneshwari ORCID, Kristiina Vind ORCID, Sophie P. J. M. Vromans ORCID, Valentin Waschulin ORCID, Sam E. Williams ORCID, Jaclyn M. Winter ORCID, Thomas E. Witte ORCID, Huali Xie ORCID, Dong Yang ORCID, Jingwei Yu ORCID and other authors.
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Abstract With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.