Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

Vallania, Francesco; Tam, Andrew; Lofgren, Shane; Schaffert, Steven; Azad, Tej D.; Bongen, Erika; Haynes, Winston; Alsup, Meia; Alonso, Michael; Davis, Mark; Engleman, Edgar; Khatri, Purvesh

Published in

Nature Research, Nature Communications, 1(9), 2018

DOI: 10.1038/s41467-018-07242-6

Tools

Export citation

Search in Google Scholar

Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

Journal article published in 2018 by Francesco Vallania

, Andrew Tam, Shane Lofgren, Steven Schaffert, Tej D. Azad

, Erika Bongen, Winston Haynes

, Meia Alsup, Michael Alonso, Mark Davis, Edgar Engleman, Purvesh Khatri

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractIn silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.

Published in

Links

Tools

Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

Abstract