Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Ebler, Jana; Ebert, Peter; Clarke, Wayne E.; Rausch, Tobias; Audano, Peter A.; Houwaart, Torsten; Mao, Yafei; Korbel, Jan O.; Eichler, Evan E.; Zody, Michael C.; Dilthey, Alexander T.; Marschall, Tobias

Published in

Nature Research, Nature Genetics, 4(54), p. 518-525, 2022

DOI: 10.1038/s41588-022-01043-w

Tools

Export citation

Search in Google Scholar

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Journal article published in 2022 by Jana Ebler

, Peter Ebert

, Wayne E. Clarke, Tobias Rausch

, Peter A. Audano, Torsten Houwaart

, Yafei Mao

, Jan O. Korbel

, Evan E. Eichler

, Michael C. Zody

, Alexander T. Dilthey, Tobias Marschall

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractTypical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Published in

Links

Tools

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Abstract