Published in

Nature Research, Nature Communications, 1(5), 2014

DOI: 10.1038/ncomms4934

Links

Tools

Export citation

Search in Google Scholar

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

Journal article published in 2014 by Gil A. Mcvean, Guillermo del Angel, Marcin von Grotthuss, G. A. McVeanh, Steven A. Mccarroll, Jonathan L. Marchini, Gabriel Sb, Simon Myers, Shane Mccarthy, Iain Mathieson, Daly Mj, Andy Rimmer, Carneiro Mo, Dionysia K. Xifara, Handsaker Re and other authors.
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.