Cloud-based interactive analytics for terabytes of genomic variants data

Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S.

Published in

Oxford University Press, Bioinformatics, 23(33), p. 3709-3715, 2017

DOI: 10.1093/bioinformatics/btx468

Tools

Export citation

Search in Google Scholar

Cloud-based interactive analytics for terabytes of genomic variants data

Journal article published in 2017 by Cuiping Pan, Gregory McInnes, Nicole Deflaux

, Michael Snyder, Jonathan Bingham, Somalee Datta, Philip S. Tsao

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Results We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Availability and implementation Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

Cloud-based interactive analytics for terabytes of genomic variants data

Abstract