Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

Ravanmehr, Vida; Blau, Hannah; Cappelletti, Luca; Fontana, Tommaso; Carmody, Leigh; Coleman, Ben; George, Joshy; Reese, Justin; Joachimiak, Marcin; Bocci, Giovanni; Hansen, Peter; Bult, Carol; Rueter, Jens; Casiraghi, Elena; Valentini, Giorgio; Mungall, Christopher; Oprea, Tudor I.; Robinson, Peter N.

Published in

Oxford University Press, NAR Genomics and Bioinformatics, 4(3), 2021

DOI: 10.1093/nargab/lqab113

Tools

Export citation

Search in Google Scholar

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

Journal article published in 2021 by Vida Ravanmehr, Hannah Blau, Luca Cappelletti

, Tommaso Fontana

, Leigh Carmody, Ben Coleman, Joshy George

, Justin Reese

, Marcin Joachimiak, Giovanni Bocci, Peter Hansen, Carol Bult

, Jens Rueter, Elena Casiraghi, Giorgio Valentini

and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

Published in

Links

Tools

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

Abstract