Homomorphic Compression: Making Text Processing on Compression Unlimited

Guan, Jiawei; Zhang, Feng; Ma, Siqi; Chen, Kuangyu; Hu, Yihua; Chen, Yuxing; Pan, Anqun; Du, Xiaoyong

Published in

Proceedings of the ACM on Management of Data, 4(1), p. 1-28, 2023

DOI: 10.1145/3626765

Tools

Export citation

Search in Google Scholar

Homomorphic Compression: Making Text Processing on Compression Unlimited

Journal article published in 2023 by Jiawei Guan

, Feng Zhang

, Siqi Ma

, Kuangyu Chen

, Yihua Hu

, Yuxing Chen

, Anqun Pan

, Xiaoyong Du

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Lossless data compression is an effective way to handle the huge transmission and storage overhead of massive text data. Its utility is even more significant today when data volumes are skyrocketing. The concept of operating on compressed data infuses new blood into efficient text management by enabling mainly access-oriented text processing tasks to be done directly on compressed data without decompression. Facing limitations of the existing compressed text processing schemes such as limited types of operations supported, low efficiency, and high space occupation, we address these problems by proposing a homomorphic compression theory. It enables the generalization and characterization of algorithms with compression processing capabilities. On this basis, we develop HOCO, an efficient text data management engine that supports a variety of processing tasks on compressed text. We select three representative compression schemes and implement them combined with homomorphism in HOCO. HOCO supports the extension of homomorphic compression schemes through a modular and object-oriented design and has convenient interfaces for text processing tasks. We evaluate HOCO on six real-world datasets. The three schemes implemented in HOCO show trade-offs in terms of compression ratio, supported operation types, and efficiency. Experiments also show that HOCO can achieve higher throughput in random access and modification operations (averagely 9.18× than the state-of-the-art) and lower latency in text analytic tasks (averagely 7.16× than processing on uncompressed text) without compromising compression efficacy.

Published in

Links

Tools

Homomorphic Compression: Making Text Processing on Compression Unlimited

Abstract