Published in

Oxford University Press, Bioinformatics, 13(38), p. 3454-3461, 2022

DOI: 10.1093/bioinformatics/btac353

Links

Tools

Export citation

Search in Google Scholar

Conditional generative modeling forde novoprotein design with hierarchical functions

Journal article published in 2022 by Tim Kucera ORCID, Matteo Togninalli, Laetitia Meng-Papaxanthos ORCID
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Abstract Motivation Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. Results Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. Availability and implementation The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. Supplementary information Supplemental data are available at Bioinformatics online.