Differential expression analysis for sequence count data

Anders, Simon; Huber, Wolfgang

Published in

BioMed Central, Genome Biology, 10(11), 2010

DOI: 10.1186/gb-2010-11-10-r106

Nature Precedings, 2010

DOI: 10.1038/npre.2010.4282.2

Nature Precedings

DOI: 10.1038/npre.2010.4282.1

Tools

Export citation

Search in Google Scholar

Differential expression analysis for sequence count data

Journal article published in 2010 by Simon Anders

, Wolfgang Huber

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

High-throughput DNA sequencing is a powerful and versatile new technology for ob-taining comprehensive and quantitative data about RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq), and genetic variations between individuals. It addresses es-sentially all of the use cases that microarrays were applied to in the past, but produces more detailed and more comprehensive results. One of the basic statistical tasks is inference (testing, regression) on discrete count values (e.g., representing the number of times a certain type of mRNA was sampled by the sequencing machine). Challenges are posed by a large dynamic range, heteroskedas-ticity and small numbers of replicates. Hence, model-based approaches are needed to achieve statistical power. I will present an error model that uses the negative binomial distribution, with vari-ance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. I will also discuss how to use the GLM framework to detect alternative transcript isoform usage. A free open-source R software package, DESeq, is available from the Bioconductor project.

Published in

Links

Tools

Differential expression analysis for sequence count data

Abstract