Published in

Wiley, Biometrics, 2(78), p. 499-511, 2021

DOI: 10.1111/biom.13466

Links

Tools

Export citation

Search in Google Scholar

On polygenic risk scores for complex traits prediction

Journal article published in 2021 by Bingxin Zhao ORCID, Fei Zou ORCID
Distributing this paper is prohibited by the publisher
Distributing this paper is prohibited by the publisher

Full text: Unavailable

Red circle
Preprint: archiving forbidden
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

AbstractPolygenic risk scores (PRS) have gained substantial attention for complex traits prediction in genome‐wide association studies (GWAS). Motivated by the polygenic model of complex traits, we study the statistical properties of PRS under the high‐dimensional but sparsity free setting where the triplet with being the sample size, the number of assayed single‐nucleotide polymorphisms (SNPs), and the number of assayed causal SNPs, respectively. First, we derive asymptotic results on the out‐of‐sample (prediction) R‐squared for PRS. These results help understand the widespread observed gap between the in‐sample heritability (or partial R‐squared due to the genetic features) estimate and the out‐of‐sample R‐squared for most complex traits. Next, we investigate how features should be selected (e.g., by a p‐value threshold) for constructing optimal PRS. We reveal that the optimal threshold depends largely on the genetic architecture underlying the complex trait and the sample size of the training GWAS, or the ratio. For highly polygenic traits with a large ratio, it is difficult to separate causal and null SNPs and stringent feature selection in principle often leads to poor PRS prediction. We numerically illustrate the theoretical results with intensive simulation studies and real data analysis on 33 complex traits with a wide range of genetic architectures in the UK Biobank database.