Oxford University Press, Nucleic Acids Research, W1(50), p. W598-W610, 2022
DOI: 10.1093/nar/gkac426
Full text: Download
Abstract In this study we show that protein language models can encode structural and functional information of GPCR sequences that can be used to predict their signaling and functional repertoire. We used the ESM1b protein embeddings as features and the binding information known from publicly available studies to develop PRECOGx, a machine learning predictor to explore GPCR interactions with G protein and β-arrestin, which we made available through a new webserver (https://precogx.bioinfolab.sns.it/). PRECOGx outperformed its predecessor (e.g. PRECOG) in predicting GPCR-transducer couplings, being also able to consider all GPCR classes. The webserver also provides new functionalities, such as the projection of input sequences on a low-dimensional space describing essential features of the human GPCRome, which is used as a reference to track GPCR variants. Additionally, it allows inspection of the sequence and structural determinants responsible for coupling via the analysis of the most important attention maps used by the models as well as through predicted intramolecular contacts. We demonstrate applications of PRECOGx by predicting the impact of disease variants (ClinVar) and alternative splice forms from healthy tissues (GTEX) of human GPCRs, revealing the power to dissect system biasing mechanisms in both health and disease.