Evaluation of single-nucleotide polymorphism imputation using random forests

Schwarz, Daniel F.; Szymczak, Silke; Ziegler, Andreas; König, Inke R.

Published in

BioMed Central, BMC Proceedings, S7(3), 2009

DOI: 10.1186/1753-6561-3-s7-s65

Tools

Export citation

Search in Google Scholar

Evaluation of single-nucleotide polymorphism imputation using random forests

Journal article published in 2009 by Daniel F. Schwarz, Silke Szymczak

, Andreas Ziegler, Inke R. König

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Genome-wide association studies (GWAS) have helped to reveal genetic mechanisms of complex diseases. Although commonly used genotyping technology enables us to determine up to a million single-nucleotide polymorphisms (SNPs), causative variants are typically not genotyped directly. A favored approach to increase the power of genome-wide association studies is to impute the untyped SNPs using more complete genotype data of a reference population. Random forests (RF) provides an internal method for replacing missing genotypes. A forest of classification trees is used to determine similarities of probands regarding their genotypes. These proximities are then used to impute genotypes of untyped SNPs. We evaluated this approach using genotype data of the Framingham Heart Study provided as Problem 2 for Genetic Analysis Workshop 16 and the Caucasian HapMap samples as reference population. Our results indicate that RFs are faster but less accurate than alternative approaches for imputing untyped SNPs.

Published in

Links

Tools

Evaluation of single-nucleotide polymorphism imputation using random forests

Abstract