Dissemin is shutting down on January 1st, 2025

Published in

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

DOI: 10.1109/ipdps.2011.271

Links

Tools

Export citation

Search in Google Scholar

An Optimized Reduction Design to Minimize Atomic Operations in Shared Memory Multiprocessors

Proceedings article published in 2011 by Ettore Speziale, Andrea di Biagio, Giovanni Agosta ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Reduction operations play a key role in modern massively data parallel computation. However, current implementations in shared memory programming APIs such as OpenMP are often computation bottlenecks due to the high number of atomic operations involved. We propose a reduction design that exploits the coupling with a barrier synchronization to optimize the execution of the reduction. Experimental results show how the number of atomic operations involved is dramatically reduced, which can lead to significant improvement in scaling properties on large numbers of processing elements. We report a speedup of 1.53x on the 312.swim_m SPEC OMP2001 benchmark and a speedup of 4.02x on the streamcluster benchmark from the PARSEC suite over the baseline.