Oxford University Press, Biostatistics, 3(12), p. 445-461, 2011
DOI: 10.1093/biostatistics/kxq072
Full text: Download
In the analysis of genome-wide association (GWA) data, the aim is to detect statistical associations between single nucleotide polymorphisms (SNPs) and the disease or trait of interest. These SNPs, or the particular regions of the genome they implicate, are then considered for further study. We demonstrate through a comprehensive simulation study that the inclusion of additional, biologically relevant information through a 2-level empirical Bayes hierachical model framework offers a more robust method of detecting associated SNPs. The empirical Bayes approach is an objective means of analyzing the data without the need for the setting of subjective parameter estimates. This framework gives more stable estimates of effects through a reduction of the variability in the usual effect estimates. We also demonstrate the consequences of including additional information that is not informative and examine power and false-positive rates. We apply the methodology to a number of genome-wide association (GWA) data sets with the inclusion of additional biological information. Our results agree with previous findings and in the case of one data set (Crohn's disease) suggest an additional region of interest. ; PUBLISHED ; ISSN 1465-4644 ; This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. We thank Derek Morris for contributions regarding the Wellcome Trust Case-Control data and for additional advice.