Published in

Karger Publishers, Human Heredity, 1(58), p. 30-39, 2004

DOI: 10.1159/000081454

Karger Publishers, Human Heredity, 1(58), p. 40-48

DOI: 10.1159/000081455

Links

Tools

Export citation

Search in Google Scholar

Effect of Population Stratification on Case-Control Association Studies

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

<i>Objectives:</i> This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern [1, 2], whereas others believe that PS may contribute to contradictory findings in genetic association [3]. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio. <i>Methods:</i> We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association). <i>Results:</i> The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Δ, that is highly correlated with the type I error rate. In the companion article (part II, in this issue) [4], we extend this work to multiple subpopulations and unequal sampling proportions. <i>Conclusion:</i> Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.