Elsevier, Ecological Modelling, 1(182), p. 75-90
DOI: 10.1016/j.ecolmodel.2004.07.012
Full text: Download
In the central California coastal forests, a newly discovered virulent pathogen (Phytophthora ramorum) has killed hundreds of thousands of native oak trees. Predicting the potential distribution of the disease in California remains an urgent demand of regulators and scientists. Most methods used to map potential ranges of species (e.g. multivariate or logistic regression) require both presence and absence data, the latter of which are not always feasibly collected, and thus the methods often require the generation of ‘pseudo’ absence data. Other methods (e.g. BIOCLIM and DOMAIN) seek to model the presence-only data directly. In this study, we present alternative methods to conventional approaches to modeling by developing support vector machines (SVMs), which are the new generation of machine learning algorithms used to find optimal separability between classes within datasets, to predict the potential distribution of Sudden Oak Death in California. We compared the performances of two types of SVMs models: two-class SVMs with ‘pseudo’ absence data and one-class SVMs. Both models performed well. The one-class SVMs have a slightly better true-positive rate (0.9272 ± 0.0460 S.D.) than the two-class SVMs (0.9105 ± 0.0712 S.D.). However, the area predicted to be at risk for the disease using the one-class SVMs (18,441 km2) is much larger than that of the two-class SVMs (13,828 km2). Both models show that the majority of disease risk will occur in coastal areas. Compared with the results of two-class SVMs, the one-class SVMs predict a potential risk in the foothills of the Sierra Nevada mountain ranges; much greater risks are also found in Los Angles and Humboldt Counties. We believe the support vector machines when coupled with geographic information system (GIS) will be a useful method to deal with presence-only data in ecological analysis over a range of scales.