Multicore-optimized wavefront diamond blocking for optimizing stencil updates

Malas, Tareq; Hager, Georg; Ltaief, Hatem; Stengel, Holger; Wellein, Gerhard; Keyes, David

Published in

Society for Industrial and Applied Mathematics, SIAM Journal on Scientific Computing, 4(37), p. C439-C464

DOI: 10.1137/140991133

Tools

Export citation

Search in Google Scholar

Multicore-optimized wavefront diamond blocking for optimizing stencil updates

Journal article published in 2014 by Tareq Malas, Georg Hager, Hatem Ltaief, Holger Stengel, Gerhard Wellein, David Keyes

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multi-core wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.

Published in

Links

Tools

Multicore-optimized wavefront diamond blocking for optimizing stencil updates

Abstract