Unifying cancer and normal RNA sequencing data from different sources

Wang, Qingguo; Armenia, Joshua; Zhang, Chao; Penson, Alexander V.; Reznik, Ed; Zhang, Liguo; Minet, Thais; Ochoa, Angelica; Gross, Benjamin E.; Iacobuzio-Donahue, Christine A.; Betel, Doron; Taylor, Barry S.; Gao, Jianjiong; Schultz, Nikolaus

Published in

Nature Research, Scientific Data, 1(5), 2018

DOI: 10.1038/sdata.2018.61

Tools

Export citation

Search in Google Scholar

Unifying cancer and normal RNA sequencing data from different sources

Journal article published in 2018 by Qingguo Wang

, Joshua Armenia, Chao Zhang

, Alexander V. Penson, Ed Reznik

, Liguo Zhang, Thais Minet, Angelica Ochoa, Benjamin E. Gross, Christine A. Iacobuzio-Donahue, Doron Betel, Barry S. Taylor

, Jianjiong Gao, Nikolaus Schultz

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractDriven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare.

Published in

Links

Tools

Unifying cancer and normal RNA sequencing data from different sources

Abstract