Building a PubMed knowledge graph

Xu, Jian; Kim, Sunkyu; Song, Min; Jeong, Minbyul; Kim, Donghyeon; Kang, Jaewoo; Rousseau, Justin F.; Li, Xin; Xu, Weijia; Torvik, Vetle I.; Bu, Yi; Chen, Chongyan; Ebeid, Islam Akef; Li, Daifeng; Ding, Ying

Published in

Nature Research, Scientific Data, 1(7), 2020

DOI: 10.1038/s41597-020-0543-2

Tools

Export citation

Search in Google Scholar

Building a PubMed knowledge graph

Journal article published in 2020 by Jian Xu

, Sunkyu Kim, Min Song, Minbyul Jeong, Donghyeon Kim, Jaewoo Kang

, Justin F. Rousseau

, Xin Li

, Weijia Xu, Vetle I. Torvik, Yi Bu, Chongyan Chen, Islam Akef Ebeid, Daifeng Li, Ying Ding

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractPubMed^® is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID^®, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.

Published in

Links

Tools

Building a PubMed knowledge graph

Abstract