Wiley, Proceedings of the American Society for Information Science and Technology, 1(49), p. 1-4, 2012
Full text: Download
Social-ecological research is characteristic of long-tail science, with many region-specific studies of social and ecological phenomena that collectively yield a large volume of highly heterogeneous, small data sets. This variability makes it difficult to determine the applicability of a particular data set for a new research question, hindering the reuse of data that has been often collected through extensive effort. In this paper we present results of automatic classification of socio-ecological data into categories defined by a domain model called the SES Framework. We have applied our methods to the classification of a relational database containing over 18 years of research on forest systems. Our preliminary results suggest that decision tree-based classifiers along with textual features perform well at this task. Furthermore, social-ecological data sets are found to exhibit distinct classification features in that the results are promising even for classes that comprise a relatively small portion of the database.