Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Alaejos, Guillermo; Castelló, Adrián; Alonso-Jordá, Pedro; Igual, Francisco D.; Martínez, Héctor; Quintana-Ortí, Enrique S.

Published in

Association for Computing Machinery (ACM), ACM Transactions on Mathematical Software, 1(50), p. 1-34, 2024

DOI: 10.1145/3638532

Tools

Export citation

Search in Google Scholar

Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Journal article published in 2024 by Guillermo Alaejos

, Adrián Castelló

, Pedro Alonso-Jordá

, Francisco D. Igual

, Héctor Martínez

, Enrique S. Quintana-Ortí

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS, and OpenBLAS, to obtain high-performance blocked formulations of the general matrix multiplication ( gemm ). In addition, we fully automatize the generation process by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for gemm . This is in contrast with the convention in high-performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for gemm (1) improves portability, maintainability, and, globally, streamlines the software life cycle; (2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and (3) features a small memory footprint.

Published in

Links

Tools

Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Abstract