A simple probability based term weighting scheme for automated text classification

Liu, Ying; Loh, Han Tong

Published in

New Trends in Applied Artificial Intelligence, p. 33-43

DOI: 10.1007/978-3-540-73325-6_4

Tools

Export citation

Search in Google Scholar

A simple probability based term weighting scheme for automated text classification

Proceedings article published in 2007 by Ying Liu

, Han Tong Loh

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

In the automated text classification, tfidf is often considered as the default term weighting scheme and has been widely reported in literature. However, tfidf does not directly reflect terms’ category membership. Inspired by the analysis of various feature selection methods, we propose a simple probability based term weighting scheme which directly utilizes two critical information ratios, i.e. relevance indicators. These relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study based on two data sets, including Reuters-21578, demonstrates that the proposed probability based term weighting scheme outperforms tfidf significantly using Bayesian classifier and Support Vector Machines (SVM).

Published in

Links

Tools

A simple probability based term weighting scheme for automated text classification

Abstract