Published in

Taylor and Francis Group, International Journal of Human-Computer Interaction, 3(23), p. 287-314

DOI: 10.1080/10447310701702519

Links

Tools

Export citation

Search in Google Scholar

Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

Journal article published in 2007 by Jinwook Seo, Heather Gordish-Dressman ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Red circle
Preprint: archiving forbidden
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

ABSTRACT Multidimensionaldatasets often include categorical information. When most dimensions have categorical information, clustering the dataset as a whole can reveal interesting patterns in the dataset. However, the categorical information is often more useful as a way to partition the dataset: gene expression data for healthy vs. diseased samples or stock performance,for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way toutilize the categorical information together with clustering algorithms. Users can partition the dataset according to categorical information vertically or horizontally, and the clustering result for eachpartition can serve as new categorical information. We report the results of a longitudinal case study with a biomedical research team, including insights gainedand potential future work.