SF-HME system: A hierarchical mixtures-of-experts classification system for spam filtering

Belsis, Petros; Fragos, Kostas; Gritzalis, Stefanos; Skourlas, Christos

Published in

Proceedings of the 2006 ACM symposium on Applied computing - SAC '06

DOI: 10.1145/1141277.1141360

Tools

Export citation

Search in Google Scholar

SF-HME system: A hierarchical mixtures-of-experts classification system for spam filtering

Proceedings article published in 2006 by Petros Belsis, Kostas Fragos, Stefanos Gritzalis, Christos Skourlas

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Many linear statistical models have been lately proposed in text classification related literature and evaluated against the Unsolicited Bulk Email filtering problem. Despite their popularity - due both to their simplicity and relative ease of interpretation - the non-linearity assumption of data samples is inappropriate in practice, due to its inability to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose the SF-HME, a Hierarchical Mixture-of-Experts system, attempting to overcome limitations common to other machine- learning based approaches when applied to spam mail classification. By reducing the dimensionality of data through the usage of the effective Simba algorithm for feature selection, we evaluated our SF-HME system with a publicly available corpus of emails, with very high similarity between legitimate and bulk email - and thus low discriminative potential - where the traditional rule based filtering approaches achieve considerable lower degrees of precision. As a result, we confirm the domination of our SF-HME method against other machine learning approaches, which appeared to present lesser degree of recall.

Published in

Links

Tools

SF-HME system: A hierarchical mixtures-of-experts classification system for spam filtering

Abstract