Long-read, whole-genome shotgun sequence data for five model organisms

Kim, Kristi E.; Peluso, Paul; Babayan, Primo; Baybayan, P.; Jane Yeadon, P.; Yeadon, P. Jane; Yu, Charles; Fisher, William W.; Chin, Chen-Shan; Rapicavoli, Nicole A.; Rank, David R.; Li, Joachim; Catcheside, David E. A.; Celniker, Susan E.; Phillippy, Adam M.; Bergman, Casey M.; Landolin, Jane M.

Published in

Nature Research, Scientific Data, 1(1), 2014

DOI: 10.1038/sdata.2014.45

Tools

Export citation

Search in Google Scholar

Long-read, whole-genome shotgun sequence data for five model organisms

Journal article published in 2014 by Kristi E. Kim, Paul Peluso, Primo Babayan, P. Baybayan, P. Jane Yeadon, P. Jane Yeadon, Charles Yu, William W. Fisher, Chen-Shan Chin

, Nicole A. Rapicavoli, David R. Rank

, Joachim Li, David E. A. Catcheside, Susan E. Celniker, Adam M. Phillippy and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractSingle molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

Published in

Links

Tools

Long-read, whole-genome shotgun sequence data for five model organisms

Abstract