Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

An, Ulzee; Pazokitoroudi, Ali; Alvarez, Marcus; Huang, Lianyun; Bacanu, Silviu; Schork, Andrew J.; Kendler, Kenneth; Pajukanta, Päivi; Flint, Jonathan; Zaitlen, Noah; Cai, Na; Dahl, Andy; Sankararaman, Sriram

Published in

Nature Research, Nature Genetics, 2023

DOI: 10.1038/s41588-023-01558-w

Tools

Export citation

Search in Google Scholar

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

Journal article published in 2023 by Ulzee An

, Ali Pazokitoroudi, Marcus Alvarez

, Lianyun Huang

, Silviu Bacanu

, Andrew J. Schork, Kenneth Kendler

, Päivi Pajukanta

, Jonathan Flint

, Noah Zaitlen

, Na Cai

, Andy Dahl, Sriram Sankararaman

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractBiobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

Published in

Links

Tools

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries

Abstract