Rail-RNA: Scalable analysis of RNA-seq splicing and coverage

Nellore, Abhinav; Collado-Torres, Leonardo; Jaffe, Andrew E.; Alquicira-Hernández, José; Wilks, Christopher; Pritt, Jacob; Morton, James; Leek, Jeffrey T.; Langmead, Ben

Published in

Oxford University Press, Bioinformatics, 24(33), p. 4033-4040, 2016

DOI: 10.1093/bioinformatics/btw575

Tools

Export citation

Search in Google Scholar

Rail-RNA: Scalable analysis of RNA-seq splicing and coverage

Journal article published in 2015 by Abhinav Nellore

, Leonardo Collado-Torres

, Andrew E. Jaffe, José Alquicira-Hernández

, Christopher Wilks, Jacob Pritt, James Morton, Jeffrey T. Leek, Ben Langmead

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples. Results We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon–exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables. Availability and Implementation Rail-RNA is open-source software available at http://rail.bio. Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

Rail-RNA: Scalable analysis of RNA-seq splicing and coverage

Abstract