Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Kille, Bryce; Nute, Michael G.; Huang, Victor; Kim, Eddie; Phillippy, Adam M.; Treangen, Todd J.

Published in

Oxford University Press, Bioinformatics, 5(40), 2024

DOI: 10.1093/bioinformatics/btae311

Tools

Export citation

Search in Google Scholar

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Journal article published in 2024 by Bryce Kille

, Michael G. Nute, Victor Huang, Eddie Kim, Adam M. Phillippy

, Todd J. Treangen

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014. Results To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4× and reduce runtime by over 2×, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes. Availability and implementation Parsnp v2 is available at https://github.com/marbl/parsnp.

Published in

Links

Tools

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Abstract