Towards an Efficient Rdf Dataset Slicing

Marx, Edgard; Soru, Tommaso; Shekarpour, Saeedeh; Auer, Sören; Ngomo, Axel-Cyrille Ngonga; Breitman, Karin

Published in

World Scientific Publishing, International Journal of Semantic Computing, 04(07), p. 455-477

DOI: 10.1142/s1793351x13400151

Tools

Export citation

Search in Google Scholar

Towards an Efficient Rdf Dataset Slicing

Journal article published in 2013 by Edgard Marx, Tommaso Soru, Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Karin Breitman

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Over the last years, a considerable amount of structured data has been published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in Resource Description Framework (RDF) data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL Protocol and RDF Query Language (SPARQL) dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or Internationalized resource identifier (IRI) in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. Furthermore, we evaluate our slicing approach on three different optimization strategies. Results show that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store.

Published in

Links

Tools

Towards an Efficient Rdf Dataset Slicing

Abstract