Oxford University Press, Systematic Biology, 4(45), p. 516-523, 1996
Oxford University Press (OUP), Systematic Biology, 4(45), p. 516
DOI: 10.2307/2413528
Full text: Download
A Monte Carlo approach was used to estimate the accuracy of a given tree reconstruction method for any number of taxa. In this procedure, we sampled randomly over all possible bifurcating trees assigning substitution rates (branch lengths) to each edge from an exponential distribution to obtain a biologically sensible maximal observed distance. Three different sets of trees were studied: the unrestricted tree space, the biologically meaningful tree space as introduced by Nei et al. (1995, Science 267:253–254), and the population data tree space. We used this technique to elucidate the performance of neighbor joining as a function of the number of taxa, assuming that distances are uncorrected and sequences evolve according to the Jukes–Cantor model. The accuracy of neighbor joining decreases almost exponentially with the number of taxa. However, the rate of decrease depends on the tree space studied. Although the accuracy decreases towards zero, the similarity, i.e., the number of partitions that are identical between model tree and reconstructed tree, is in all cases studied much higher than the value expected for two randomly chosen trees. Although the probability of recovering the true tree is dramatically influenced by sequence length, the average similarity does not decrease substantially if branch lengths are not too short.