Published in

Wiley, Annals of Human Genetics, 2(80), p. 136-143, 2016

DOI: 10.1111/ahg.12147

Links

Tools

Export citation

Search in Google Scholar

On Sample Size and Power Calculation for Variant Set-Based Association Tests

Journal article published in 2016 by Baolin Wu, James S. Pankow ORCID
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Sample size and power calculations are an important part of designing new sequence-based association studies. In this paper, we explore an efficient and accurate approach to computing sample size and power (particularly at small significance level, e.g., 10−6) for the sequence kernel association test (SKAT), which is a powerful and widely used approach for testing variant set association. The recently developed SEQPower (Wang et al., 2014) and SPS programs (Li et al., 2015) adopted random Monte Carlo simulations to empirically estimate power for a series of variant set association test methods including the SKAT, which could be very computing intensive and time consuming. It is desirable to develop methods that can quickly and accurately compute power without intensive Monte Carlo simulations. To our knowledge, the only analytical approach to computing power for SKAT was proposed by at Lee et al. (2012), who used an approximate non-central χ2 distribution to efficiently compute sample size and power for SKAT and related methods. However we will show that the computed power based on the analytical approach of Lee et al. (2012) could be inflated especially for a small significance level, which is often of primary interest for large-scale whole genome and exome sequencing projects. We propose a new non-central χ2 approximation based approach to accurately and efficiently compute sample size and power. In addition we study and implement a more accurate “exact” method to compute power, which is more efficient than the Monte Carlo approach though generally involves more computations than the χ2 approximation method. The exact approach could produce very accurate results and be used to verify alternative approximation approaches. We implement the proposed methods in publicly available R programs that can be readily adapted when planning sequencing projects.