Protein design and variant prediction using autoregressive generative models

Shin, Jung-Eun; Riesselman, Adam J.; Kollasch, Aaron W.; McMahon, Conor; Simon, Elana; Sander, Chris; Manglik, Aashish; Kruse, Andrew C.; Marks, Debora S.

Published in

Nature Research, Nature Communications, 1(12), 2021

DOI: 10.1038/s41467-021-22732-w

Tools

Export citation

Search in Google Scholar

Protein design and variant prediction using autoregressive generative models

Journal article published in 2021 by Jung-Eun Shin, Adam J. Riesselman, Aaron W. Kollasch

, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik

, Andrew C. Kruse

, Debora S. Marks

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractThe ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 10⁵-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

Published in

Links

Tools

Protein design and variant prediction using autoregressive generative models

Abstract