Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

Rodríguez-Sánchez, Rafael; Castelló, Adrián; Catalán, Sandra; Igual, Francisco D.; Quintana-Ortí, Enrique S.

Published in

SAGE Publications, International Journal of High Performance Computing Applications, p. 109434202311576, 2023

DOI: 10.1177/10943420231157653

Tools

Export citation

Search in Google Scholar

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

Journal article published in 2023 by Rafael Rodríguez-Sánchez

, Adrián Castelló, Sandra Catalán, Francisco D. Igual

, Enrique S. Quintana-Ortí

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.

Published in

Links

Tools

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

Abstract