Active Learning of Domain-Specific Distances for Link Discovery

Soru, Tommaso; Ngomo, Axel-Cyrille Ngonga

Published in

Springer Verlag, Lecture Notes in Computer Science, p. 97-112

DOI: 10.1007/978-3-642-37996-3_7

Tools

Export citation

Search in Google Scholar

Active Learning of Domain-Specific Distances for Link Discovery

Proceedings article published in 2013 by Tommaso Soru, Axel-Cyrille Ngonga Ngomo

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Discovering cross-knowledge-base links is of central importance for manifold tasks across the Linked Data Web. So far, learning link specifications has been addressed by approaches that rely on standard similarity and distance measures such as the Levenshtein distance for strings and the Euclidean distance for numeric values. While these approaches have been shown to perform well, the use of standard similarity measure still hampers their accuracy, as several link discovery tasks can only be solved sub-optimally when relying on standard measures. In this paper, we address this drawback by presenting a novel approach to learning string similarity measures concurrently across multiple dimensions directly from labeled data. Our approach is based on learning linear classifiers which rely on learned edit distance within an active learning setting. By using this combination of paradigms, we can ensure that we reduce the labeling burden on the experts at hand while achieving superior results on datasets for which edit distances are useful. We evaluate our approach on three different real datasets and show that our approach can improve the accuracy of classifiers. We also discuss how our approach can be extended to other similarity and distance measures as well as different classifiers.

Published in

Links

Tools

Active Learning of Domain-Specific Distances for Link Discovery

Abstract