Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09
Full text: Download
In this paper, we try to predict which category will be less ac- curately classifled compared with other categories in a clas- siflcation task that involves multiple categories. The cat- egories with poor predicted performance will be identifled before any classiflers are trained and additional steps can be taken to address the predicted poor accuracies of these cat- egories. Inspired by the work on query performance predic- tion in ad-hoc retrieval, we propose to predict classiflcation performance using two measures, namely, category size and category coherence. Our experiments on 20-Newsgroup and Reuters-21578 datasets show that the Spearman rank corre- lation coe-cient between the predicted rank of classiflcation performance and the expected classiflcation accuracy is as high as 0.9.