Dissemin is shutting down on January 1st, 2025

Published in

Public Library of Science, PLoS ONE, 7(10), p. e0129384, 2015

DOI: 10.1371/journal.pone.0129384

Links

Tools

Export citation

Search in Google Scholar

MLgsc: A Maximum-Likelihood General Sequence Classifier

Journal article published in 2015 by Thomas Junier, Vincent Hervé ORCID, Tina Wunderlin, Pilar Junier
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase sub-unit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.