Dissemin is shutting down on January 1st, 2025

Published in

Springer Verlag, Lecture Notes in Computer Science, p. 580-594

DOI: 10.1007/978-3-319-07881-6_39

Links

Tools

Export citation

Search in Google Scholar

Extracting Facets from Lost Fine-grained Categorizations in Dataspaces

Proceedings article published in 2014 by Riccardo Porrini, Matteo Palmonari ORCID, Carlo Batini
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Categorization of instances in dataspaces is a difficult and time consuming task, usually performed by domain experts. In this paper we propose a semi-automatic approach to the extraction of facets for the fine-grained categorization of instances in dataspaces. We focus on the case where instances are categorized under heterogeneous taxonomies in several sources. Our approach leverages Taxonomy Layer Distance, a new metric based on structural analysis of source taxonomies, to support the identification of meaningful candidate facets. Once validated and refined by domain experts, the extracted facets provide a fine-grained classification of dataspace instances. We implemented and evaluated our approach in a real world dataspace in the eCommerce domain. Experimental results show that our approach is capable of extracting meaningful facets and that the new metric we propose for the structural analysis of source taxonomies outperforms other state-of-the-art metrics.