Published in

PeerJ, PeerJ, (3), p. e761, 2015

DOI: 10.7717/peerj.761

Links

Tools

Export citation

Search in Google Scholar

Concordance and discordance of sequence survey methods for molecular epidemiology

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Red circle
Preprint: archiving forbidden
Red circle
Postprint: archiving forbidden
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

The post-genomic era is characterized by the direct acquisition and analysis of ge-nomic data with many applications, including the enhancement of the understanding of microbial epidemiology and pathology. However, there are a number of molecular approaches to survey pathogen diversity, and the impact of these different approaches on parameter estimation and inference are not entirely clear. We sequenced whole genomes of bacterial pathogens, Burkholderia pseudomallei, Yersinia pestis, and Brucella spp. (60 new genomes), and combined them with 55 genomes from Gen-Bank to address how different molecular survey approaches (whole genomes, SNPs, and MLST) impact downstream inferences on molecular evolutionary parameters, evolutionary relationships, and trait character associations. We selected isolates for sequencing to represent temporal, geographic origin, and host range variability. We found that substitution rate estimates vary widely among approaches, and that SNP and genomic datasets yielded different but strongly supported phylogenies. MLST yielded poorly supported phylogenies, especially in our low diversity dataset, i.e., Y. pestis. Trait associations showed that B. pseudomallei and Y. pestis phylogenies are significantly associated with geography, irrespective of the molecular survey approach used, while Brucella spp. phylogeny appears to be strongly associated with geography and host origin. We contrast inferences made among monomorphic (clonal) and non-monomorphic bacteria, and between intra-and inter-specific datasets. We also discuss our results in light of underlying assumptions of different approaches.