Published in

BioMed Central, Genome Biology, 7(8), p. R143

DOI: 10.1186/gb-2007-8-7-r143

Handbook of Molecular Microbial Ecology I, p. 149-155

DOI: 10.1002/9781118010518.ch19

Links

Tools

Export citation

Search in Google Scholar

Accuracy and quality of massively parallel DNA pyrosequencing

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

© 2007 Huse et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The definitive version was published in Genome Biology 8 (2007): R143, doi:10.1186/gb-2007-8-7-r143. ; Additional data file 1 is a fasta file of the 43 known sequences used. Additional data file 2 is a gzip-compressed fasta file of the sequences output by the GS20. These sequences correspond to those included in Additional data files 3, 4, 5 but include only the final sequence information. Additional data files 3, 4, 5 are three compressed text files representing the text translations of the original GS20 binary output (sff) files for all of the sequencing used in the analysis, including sequence, flowgram and other run information. GS20 data are reported by region of the PicoTiterPlate™; we sequenced three plate regions. ; Massively parallel pyrosequencing systems have increased the efficiency of DNA sequencing, although the published per-base accuracy of a Roche GS20 is only 96%. In genome projects, highly redundant consensus assemblies can compensate for sequencing errors. In contrast, studies of microbial diversity that catalogue differences between PCR amplicons of ribosomal RNA genes (rDNA) or other conserved gene families cannot take advantage of consensus assemblies to detect and minimize incorrect base calls. We performed an empirical study of the per-base error rate for the Roche GS20 system using sequences of the V6 hypervariable region from cloned microbial ribosomal DNA (tag sequencing). We calculated a 99.5% accuracy rate in unassembled sequences, and identified several factors that can be used to remove a small percentage of low-quality reads, improving the accuracy to 99.75% or better. By using objective criteria to eliminate low quality data, the quality of individual GS20 sequence reads in molecular ecological applications can surpass the accuracy of traditional capillary methods. ; This work was supported by National Aeronautics and Space Administration Astrobiology Institute Cooperative Agreement NNA04CC04A (to MLS), subcontracts from the Woods Hole Center for Oceans and Human Health from the National Institutes of Health and National Science Foundation (NIH/NIEHS 1 P50 ES012742-01 and NSF/OCE 0430724-J Stegeman PI to HGM and MLS), grants from the WM Keck Foundation and the G Unger Vetlesen Foundation (to MLS), and a National Research Council Research Associateship Award (to JAH).