Elsevier, Parallel Computing: Systems & Applications, 14(27), p. 1847-1878
DOI: 10.1016/s0167-8191(01)00118-1
Full text: Download
In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case that the inputoutput vector is in the cyclic distribution We also show how to reduce computation time on computers with a cachebased architecture We present performance results on a Cray TE with up to processors obtaining reasonable eciency levels for local problem sizes as small as and very good eciency levels for sizes larger than