BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach

Zheng, Zhen; Pan, Zaifeng; Wang, Dalin; Zhu, Kai; Zhao, Wenyi; Guo, Tianyou; Qiu, Xiafei; Sun, Minmin; Bai, Junjie; Zhang, Feng; Du, Xiaoyong; Zhai, Jidong; Lin, Wei

Published in

Proceedings of the ACM on Management of Data, 3(1), p. 1-29, 2023

DOI: 10.1145/3617327

Tools

Export citation

Search in Google Scholar

BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach

Journal article published in 2023 by Zhen Zheng

, Zaifeng Pan

, Dalin Wang

, Kai Zhu

, Wenyi Zhao

, Tianyou Guo

, Xiafei Qiu

, Minmin Sun

, Junjie Bai

, Feng Zhang

, Xiaoyong Du

, Jidong Zhai

, Wei Lin

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Compiler optimization plays an increasingly important role to boost the performance of machine learning models for data processing and management. With increasingly complex data, the dynamic tensor shape phenomenon emerges for ML models. However, existing ML compilers either can only handle static shape models or expose a series of performance problems for both operator fusion optimization and code generation in dynamic shape scenes. This paper tackles the main challenges of dynamic shape optimization: the fusion optimization without shape value, and code generation supporting arbitrary shapes. To tackle the fundamental challenge of the absence of shape values, it systematically abstracts and excavates the shape information and designs a cross-level symbolic shape representation. With the insight that what fusion optimization relies upon is tensor shape relationships between adjacent operators rather than exact shape values, it proposes the dynamic shape fusion approach based on shape information propagation. To generate code that adapts to arbitrary shapes efficiently, it proposes a compile-time and runtime combined code generation approach. Finally, it presents a complete optimization pipeline for dynamic shape models and implements an industrial-grade ML compiler, named BladeDISC. The extensive evaluation demonstrates that BladeDISC outperforms PyTorch, TorchScript, TVM, ONNX Runtime, XLA, Torch Inductor (dynamic shape), and TensorRT by up to 6.95×, 6.25×, 4.08×, 2.04×, 2.06×, 7.92×, and 4.16× (3.54×, 3.12×, 1.95×, 1.47×, 1.24×, 2.93×, and 1.46× on average) in terms of end-to-end inference speedup on the A10 and T4 GPU, respectively. BladeDISC's source code is publicly available at https://github.com/alibaba/BladeDISC.

Published in

Links

Tools

BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach

Abstract