Normalization by distributional resampling of high throughput single-cell RNA-sequencing data

Brown, Jared; Ni, Zijian; Mohanty, Chitrasen; Bacher, Rhonda; Kendziorski, Christina

Published in

Oxford University Press, Bioinformatics, 22(37), p. 4123-4128, 2021

DOI: 10.1093/bioinformatics/btab450

Tools

Export citation

Search in Google Scholar

Normalization by distributional resampling of high throughput single-cell RNA-sequencing data

Journal article published in 2021 by Jared Brown

, Zijian Ni

, Chitrasen Mohanty, Rhonda Bacher, Christina Kendziorski

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression for library size (LS), allowing the variance and other properties of the gene-specific expression distribution to be non-constant in LS. This often results in reduced power and increased false discoveries in downstream analyses, a problem which is exacerbated by the high proportion of zeros present in most datasets. Results To address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing, sample heterogeneity and varying zero proportions, leading to improved performance in downstream analyses in a number of settings. Availability and implementation The R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/Dino. The Dino package is further archived and freely available on Zenodo at https://doi.org/10.5281/zenodo.4897558. Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

Normalization by distributional resampling of high throughput single-cell RNA-sequencing data

Abstract