Published in

Nature Research, npj Genomic Medicine, 1(3), 2018

DOI: 10.1038/s41525-018-0047-6

Links

Tools

Export citation

Search in Google Scholar

Improving imputation in disease-relevant regions: lessons from cystic fibrosis

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

AbstractDoes genotype imputation with public reference panels identify variants contributing to disease? Genotype imputation using the 1000 Genomes Project (1KG; 2504 individuals) displayed poor coverage at the causal cystic fibrosis (CF) transmembrane conductance regulator (CFTR) locus for the International CF Gene Modifier Consortium. Imputation with the larger Haplotype Reference Consortium (HRC; 32,470 individuals) displayed improved coverage but low sensitivity of variants clinically relevant for CF. A hybrid reference that combined whole genome sequencing (WGS) from 101 CF individuals with the 1KG imputed a greater number of single-nucleotide variants (SNVs) that would be analyzed in a genetic association study (r2 ≥ 0.3 and MAF ≥ 0.5%) than imputation with the HRC, while the HRC excelled in the lower frequency spectrum. Using the 1KG or HRC as reference panels missed the most common CF-causing variants or displayed low imputation accuracy. Designs that incorporate population-specific WGS can improve imputation accuracy at disease-specific loci, while imputation using public data sets can omit disease-relevant genotypes.