Dissemin is shutting down on January 1st, 2025

Published in

Springer Nature [academic journals on nature.com], The ISME Journal: Multidisciplinary Journal of Microbial Ecology, 1(10), p. 269-272, 2015

DOI: 10.1038/ismej.2015.100

Links

Tools

Export citation

Search in Google Scholar

ProDeGe: a computational protocol for fully automated decontamination of genomes

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes-clean and contaminant-using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.The ISME Journal advance online publication, 9 June 2015; doi:10.1038/ismej.2015.100.