Published in

IOP Publishing, Physical Biology, 3(8), p. 035013, 2011

DOI: 10.1088/1478-3975/8/3/035013

Links

Tools

Export citation

Search in Google Scholar

Analysis and simulation of gene expression profiles in pure and mixed cell populations

Journal article published in 2011 by Daniel Hebenstreit, Sarah A. Teichmann ORCID
Distributing this paper is prohibited by the publisher
Distributing this paper is prohibited by the publisher

Full text: Unavailable

Red circle
Preprint: archiving forbidden
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

For analysis and interpretation of data obtained from experimental readouts of gene expression, such as microarrays and RNA-sequencing, log transformation is routinely applied. This is because expression data, like many biological parameters, are strongly skewed. We show here that gene expression levels in multicellular organisms often deviate from simple (log) normal distributions and instead exhibit shouldered or bimodal distributions. Based on a mathematical model and numerical simulations, we demonstrate that many observed distributions can be explained as mixtures of bimodal two-component lognormal models. This is due to the fact that after log-transformation, the resulting distributions display reductions in the first peak rather than increasing overlaps over a wide range of parameter values. By comparing the theoretical results with biological datasets, our findings suggest that the distributions are generally bimodal for single cell types and get obscured by the different cell types that are present in tissue samples. Our analysis thus provides an initial explanation for the various types of expression level distributions that are found for different datasets. This will be important for the interpretation of next-generation sequencing data such as transcriptomics by mRNA-sequencing and ChIP-sequencing of epigenetic marks.