Published in

Public Library of Science, PLoS Biology, 6(12), p. e1001889, 2014

DOI: 10.1371/journal.pbio.1001889

Links

Tools

Export citation

Search in Google Scholar

The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing.

Journal article published in 2014 by Patrick J. Keeling, Keeling Pj, Fabien Burki, Wilcox Hm, Heather M. Wilcox, Bassem Allam, Eric E. Allen, Allen Ee, Linda A. Amaral-Zettler, Amaral Zettler La, E. Virginia Armbrust, Armbrust Ev, John M. Archibald, Archibald Jm, Arvind K. Bharti and other authors.
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

© The Author(s), 2014. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS Biology 12 (2014): e1001889, doi:10.1371/journal.pbio.1001889. ; Microbial ecology is plagued by problems of an abstract nature. Cell sizes are so small and population sizes so large that both are virtually incomprehensible. Niches are so far from our everyday experience as to make their very definition elusive. Organisms that may be abundant and critical to our survival are little understood, seldom described and/or cultured, and sometimes yet to be even seen. One way to confront these problems is to use data of an even more abstract nature: molecular sequence data. Massive environmental nucleic acid sequencing, such as metagenomics or metatranscriptomics, promises functional analysis of microbial communities as a whole, without prior knowledge of which organisms are in the environment or exactly how they are interacting. But sequence-based ecological studies nearly always use a comparative approach, and that requires relevant reference sequences, which are an extremely limited resource when it comes to microbial eukaryotes. In practice, this means sequence databases need to be populated with enormous quantities of data for which we have some certainties about the source. Most important is the taxonomic identity of the organism from which a sequence is derived and as much functional identification of the encoded proteins as possible. In an ideal world, such information would be available as a large set of complete, well-curated, and annotated genomes for all the major organisms from the environment in question. Reality substantially diverges from this ideal, but at least for bacterial molecular ecology, there is a database consisting of thousands of complete genomes from a wide range of taxa, supplemented by a phylogeny-driven approach to diversifying genomics. For eukaryotes, the number of available genomes is far, far fewer, and we have relied much more heavily on random growth of sequence databases, raising the question as to whether this is fit for purpose. ; This project was funded by the Gordon and Betty Moore Foundation (GBMF; Grants GBMF2637 and GBMF3111) to the National Center for Genome Resources (NCGR) and the National Center for Marine Algae and Microbiota (NCMA).