Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Zook, Justin M.; Chapman, Brad; Wang, Jason; Mittelman, David; Hofmann, Oliver; Hide, Winston; Salit, Marc

Published in

Nature Research, Nature Biotechnology, 3(32), p. 246-251, 2014

DOI: 10.1038/nbt.2835

Tools

Export citation

Search in Google Scholar

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Journal article published in 2014 by Justin M. Zook

, Brad Chapman, Jason Wang, David Mittelman, Oliver Hofmann, Winston Hide

, Marc Salit

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.

Published in

Links

Tools

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Abstract