Dissemin is shutting down on January 1st, 2025

Published in

Association for Computing Machinery (ACM), ACM Transactions on Architecture and Code Optimization, 3(15), p. 1-27, 2018

DOI: 10.1145/3235029

Links

Tools

Export citation

Search in Google Scholar

High-Performance Generalized Tensor Operations

Journal article published in 2018 by Roman Gareev, Tobias Grosser ORCID, Michael Kruse
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Red circle
Preprint: archiving forbidden
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

The efficiency of tensor contraction is of great importance. Compilers cannot optimize it well enough to come close to the performance of expert-tuned implementations. All existing approaches that provide competitive performance require optimized external code. We introduce a compiler optimization that reaches the performance of optimized BLAS libraries without the need for an external implementation or automatic tuning. Our approach provides competitive performance across hardware architectures and can be generalized to deliver the same benefits for algebraic path problems. By making fast linear algebra kernels available to everyone, we expect productivity increases when optimized libraries are not available.