LeCo: Lightweight Compression via Learning Serial Correlations

Liu, Yihao; Zeng, Xinyu; Zhang, Huanchen

Published in

Proceedings of the ACM on Management of Data, 1(2), p. 1-28, 2024

DOI: 10.1145/3639320

Tools

Export citation

Search in Google Scholar

LeCo: Lightweight Compression via Learning Serial Correlations

Journal article published in 2024 by Yihao Liu

, Xinyu Zeng

, Huanchen Zhang

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance. LeCo presents a general approach to this end, making existing algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and eight real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 5.2× speed up in a data analytical query in the Arrow columnar execution engine, and a 16% increase in RocksDB's throughput.

Published in

Links

Tools

LeCo: Lightweight Compression via Learning Serial Correlations

Abstract