Published in

Anais do XXXV Simpósio Brasileiro de Banco de Dados (SBBD 2020), 2020

DOI: 10.5753/sbbd.2020.13652

Links

Tools

Export citation

Search in Google Scholar

Streaming state management methods for real-time data deduplication

This paper was not found in any repository; the policy of its publisher is unknown or unclear.
This paper was not found in any repository; the policy of its publisher is unknown or unclear.

Full text: Unavailable

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

Data duplication is a common problem on data streams processing applications that occurs due to software error or adoption of data loss prevention measures, jeopardizing real-time data analyses. This paper explores stream-based deduplication methods to identify challenges from these methods and proposes a decision method to choose the most appropriate strategy for a domain. This work investigates native solutions and auxiliary tools to provide data deduplication and fault tolerance. The experimental results show that it is necessary to use fast additional storage to persist the read keys, as long as they can appear, or to use the optimized storage, with a quick key search.