Published in

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09

DOI: 10.1145/1645953.1646258

Links

Tools

Export citation

Search in Google Scholar

What makes categories difficult to classify?

Proceedings article published in 2009 by Aixin Sun, Ee-Peng Lim, Ying Liu ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In this paper, we try to predict which category will be less ac- curately classifled compared with other categories in a clas- siflcation task that involves multiple categories. The cat- egories with poor predicted performance will be identifled before any classiflers are trained and additional steps can be taken to address the predicted poor accuracies of these cat- egories. Inspired by the work on query performance predic- tion in ad-hoc retrieval, we propose to predict classiflcation performance using two measures, namely, category size and category coherence. Our experiments on 20-Newsgroup and Reuters-21578 datasets show that the Spearman rank corre- lation coe-cient between the predicted rank of classiflcation performance and the expected classiflcation accuracy is as high as 0.9.