Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

Seo, Jinwook; Gordish-Dressman, Heather

Published in

Taylor and Francis Group, International Journal of Human-Computer Interaction, 3(23), p. 287-314

DOI: 10.1080/10447310701702519

Tools

Export citation

Search in Google Scholar

Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

Journal article published in 2007 by Jinwook Seo, Heather Gordish-Dressman

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

ABSTRACT Multidimensionaldatasets often include categorical information. When most dimensions have categorical information, clustering the dataset as a whole can reveal interesting patterns in the dataset. However, the categorical information is often more useful as a way to partition the dataset: gene expression data for healthy vs. diseased samples or stock performance,for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way toutilize the categorical information together with clustering algorithms. Users can partition the dataset according to categorical information vertically or horizontally, and the clustering result for eachpartition can serve as new categorical information. We report the results of a longitudinal case study with a biomedical research team, including insights gainedand potential future work.

Published in

Links

Tools

Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

Abstract