Elsevier, Zoologischer Anzeiger, (256), p. 54-60, 2015
DOI: 10.1016/j.jcz.2015.03.004
Full text: Download
Although most phylogenetic investigations are motivated by questions about the evolution of morphological attributes, morphological data are increasingly rare as a source of characters for reconstructing phylogeny, in part because these attributes are time consuming to collect. Here we describe methods to mine the information contained in classifications as a source of phylogenetic characters, using the classification of actiniarian sea anemones (Cnidaria: Anthozoa) as our exemplar system. Our Natural Language Processing pipeline recovers more than 400 characters in the most widely-used classification of sea anemones. However, the majority of these are problematic, reflecting semantic or logical inconsistencies or being scored for only a single taxon and thus inappropriate for phylogenetic reconstruction. Although the classification cannot be directly translated into a phylogenetic matrix, the exposure of the characters that underlie a classification provide important perspective into the basis and limits of a classification system and offer a valuable starting point for the creation of a phylogenetic matrix.