Dissemin is shutting down on January 1st, 2025

Published in

Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming - ICFP 2015

DOI: 10.1145/2784731.2784754

ACM SIGPLAN Notices, 9(50), p. 205-217

DOI: 10.1145/2858949.2784754

Links

Tools

Export citation

Search in Google Scholar

Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code

Journal article published in 2015 by Michel Steuwer ORCID, Christian Fensch, Sam Lindley, Christophe Dubach
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability at the expense of performance. We propose a novel approach aiming to combine high-level programming , code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation, close to the OpenCL programming model, from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations. We formalize our system with a core dependently-typed λ-calculus along with a denotational semantics which we use to prove the correctness of the rewrite rules. We test our design in practice by implementing a compiler which generates high performance imperative OpenCL code. Our experiments show that we can automatically derive hardware-specific implementations from simple functional high-level al-gorithmic expressions offering performance on a par with highly tuned code for multicore CPUs and GPUs written by experts.