PEDL: extracting protein–protein associations using deep language models and distant supervision

Weber, Leon; Thobe, Kirsten; Migueles Lozano, Oscar Arturo; Wolf, Jana; Leser, Ulf

Published in

Oxford University Press (OUP), Bioinformatics, Supplement_1(36), p. i490-i498, 2020

DOI: 10.1093/bioinformatics/btaa430

Tools

Export citation

Search in Google Scholar

PEDL: extracting protein–protein associations using deep language models and distant supervision

Journal article published in 2020 by Leon Weber, Kirsten Thobe, Oscar Arturo Migueles Lozano, Jana Wolf

, Ulf Leser

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein–protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications. Current methods for extracting PPAs typically rely exclusively on rare manually labelled data which severely limits their performance. Results We propose PPA Extraction with Deep Language (PEDL), a method for predicting PPAs from text that combines deep language models and distant supervision. Due to the reliance on distant supervision, PEDL has access to an order of magnitude more training data than methods solely relying on manually labelled annotations. We introduce three different datasets for PPA prediction and evaluate PEDL for the two subtasks of predicting PPAs between two proteins, as well as identifying the text spans stating the PPA. We compared PEDL with a recently published state-of-the-art model and found that on average PEDL performs better in both tasks on all three datasets. An expert evaluation demonstrates that PEDL can be used to predict PPAs that are missing from major pathway databases and that it correctly identifies the text spans supporting the PPA. Availability and implementation PEDL is freely available at https://github.com/leonweber/pedl. The repository also includes scripts to generate the used datasets and to reproduce the experiments from this article. Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

PEDL: extracting protein–protein associations using deep language models and distant supervision

Abstract