Elsevier, Computational Statistics & Data Analysis, 4(31), p. 377-396
DOI: 10.1016/s0167-9473(99)00038-9
Full text: Download
An implementation of the backfitting algorithm for generalised additive models which is suitable for parallel computing is described. This implementation is designed to handle large data sets such as those occurring in data mining with several millions of observations on several hundreds of variables. For such large data sets it is crucial to have a fast, parallel implementation for fitting generalised additive models to allow an exploratory analysis of the data within a reasonable time. The approach used divides the data into several blocks (groups) and fits a (generalised) additive model to each block. These models are then merged to a single, final model. It is shown that this approach is very efficient as it allows the algorithm to adapt to the structure of the parallel computer (number of processors and amount of internal memory). ; Markus Hegland, Ian McIntosh and Berwin A. Turlach