Correcting the hub occurrence prediction bias in many dimensions

Tomasev, Nenad; Buza, Krisztian; Mladenic, Dunja

Published in

ComSIS Consortium, Computer Science and Information Systems, 1(13), p. 1-21, 2016

DOI: 10.2298/csis140929039t

Tools

Export citation

Search in Google Scholar

Correcting the hub occurrence prediction bias in many dimensions

Journal article published in 2015 by Nenad Tomasev

, Krisztian Buza

, Dunja Mladenic

This paper is made freely available by the publisher.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Policy details

Data provided by

Abstract

Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets.

Published in

Links

Tools

Correcting the hub occurrence prediction bias in many dimensions

Abstract