Full text: Download
Few methods for classifying physical activity from accelerometer data have been tested using an independent dataset for cross-validation, and even fewer using multiple independent datasets. The aim of this study was to evaluate whether unsupervised machine learning was a viable approach for the development of a reusable clustering model that was generalisable to independent datasets. We used two labelled adult laboratory datasets to generate a k-means clustering model. To assess its generalised application, we applied the stored clustering model to three independent labelled datasets: two laboratory and one free-living. Based on the development labelled data, the ten clusters were collapsed into four activity categories: sedentary, standing/mixed/slow ambulatory, brisk ambulatory, and running. The percentages of each activity type contained in these categories were 89%, 83%, 78%, and 96%, respectively. In the laboratory independent datasets, the consistency of activity types within the clusters dropped, but remained above 70% for the sedentary clusters, and 85% for the running and ambulatory clusters. Acceleration features were similar within each cluster across samples. The clusters created reflected activity types known to be associated with health and were reasonably robust when applied to diverse independent datasets. This suggests that an unsupervised approach is potentially useful for analysing free-living accelerometer data.