Taylor and Francis Group, International Journal of Human-Computer Interaction, 3(23), p. 287-314
DOI: 10.1080/10447310701702519
Full text: Download
ABSTRACT Multidimensionaldatasets often include categorical information. When most dimensions have categorical information, clustering the dataset as a whole can reveal interesting patterns in the dataset. However, the categorical information is often more useful as a way to partition the dataset: gene expression data for healthy vs. diseased samples or stock performance,for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way toutilize the categorical information together with clustering algorithms. Users can partition the dataset according to categorical information vertically or horizontally, and the clustering result for eachpartition can serve as new categorical information. We report the results of a longitudinal case study with a biomedical research team, including insights gainedand potential future work.