Published in

SAGE Publications, Holocene, 1(25), p. 130-136, 2014

DOI: 10.1177/0959683614556388

Links

Tools

Export citation

Search in Google Scholar

Taxon selection using statistical learning techniques to improve transfer function prediction

Journal article published in 2014 by Steve Juggins ORCID, Gavin L. Simpson ORCID, Richard J. Telford
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Transfer functions are widely used in palaeoecology to provide quantitative environmental reconstructions using biological proxies. Most models use all but the rarest taxa present in the training set, even though many may be unrelated to the environmental variable of interest. We hypothesise that retaining such non-informative taxa will reduce model robustness and present a method for variable selection motivated by the statistical learning algorithm in random forests. We apply our species-pruning algorithm into weighted averaging (WA) and maximum likelihood calibration of response curves (MLRCs), and compare results of boosted regression trees (BRTs) using artificial and real datasets. Results from the artificial data show that WA is particularly sensitive to the influence of both non-informative taxa and secondary environmental variables in the training set or fossil assemblage, and that BRTs are relatively immune to these effects. Furthermore, species-pruned WA and MLRCs offer substantial improvements over all-species models when the training set includes non-informative taxa but does not guard against confounding effects when species have bi- or multivariate responses to the primary and one or more secondary variables. Tests with a limited set of examples of real data indicate that BRTs, MLRCs or species-pruned models have no apparent advantage over WA. We discuss possible reasons for this contradiction and suggest that more tests are needed to properly evaluate BRTs and species-pruned models.