Dissemin is shutting down on January 1st, 2025

Published in

Association for Computing Machinery (ACM), ACM Transactions on Embedded Computing Systems, 1(22), p. 1-24, 2022

DOI: 10.1145/3550071

Links

Tools

Export citation

Search in Google Scholar

An Efficient and Flexible Stochastic CGRA Mapping Approach

Journal article published in 2022 by Satyajit Das ORCID, Kevin Martin ORCID, Thomas Peyret ORCID, Philippe Coussy ORCID
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Coarse-Grained Reconfigurable Array (CGRA) architectures are promising high-performance and power-efficient platforms. However, mapping applications efficiently on CGRA is a challenging task. This is known to be an NP complete problem. Hence, finding good mapping solutions for a given CGRA architecture within a reasonable time is complex. Additionally, finding scalability in compilation time and memory footprint for large heterogeneous CGRAs is also a well known problem. In this article, we present a stochastic mapping approach that can efficiently explore the architecture space and allows finding best of solutions while having limited and steady use of memory footprint. Experimental results show that our compilation flow allows to reach performances with low-complexity CGRA architectures that are as good as those obtained with more complex ones thanks to the better exploration of the mapping solution space. Parameters considered in our experiments are number of tiles, Register File (RF) size, number of load/store (LS) units, network topologies, and so on. Our results demonstrate that high-quality compilation for a wide range of applications is possible within reasonable run-times. Experiments with several DSP benchmarks show that the best CGRA configuration from the architectural exploration surpasses an ultra low-power DSP optimized RISC-V CPU to achieve up to 15.28× (with an average of 6× and minimum of 3.4×) performance gain and 29.7× (with an average of 13.5× and minimum of 6.3×) energy gain with an area overhead of 1.5× only.