Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems - LCR '04
Full text: Download
This paper presents runtime mechanisms that enable flex-ible use of speculative precomputation in conjunction with thread-level parallelism on SMT processors. The mecha-nisms were implemented and evaluated on a real multi-SMT system. So far, speculative precomputation and thread-level parallelism have been used disjunctively on SMT processors and no attempts have been made to compare and possi-bly combine these techniques for further optimization. We present runtime support mechanisms for coordinating pre-computation with its sibling computation, so that precom-putation is regulated to avoid cache pollution and sufficient runahead distance is allowed from the targeted computa-tion. We also present a task queue mechanism to orches-trate precomputation and thread-level parallelism, so that they can be used conjunctively in the same program. The mechanisms are motivated by the observation that differ-ent parts of a program may benefit from different modes of multithreaded execution. Furthermore, idle periods during TLP execution or sequential sections can be used for pre-computation and vice versa. We apply the mechanisms in loop-structured scientific codes. We present experimental results that verify that no single technique (precomputation or TLP) in isolation achieves the best performance in all cases. Efficient combination of precomputation and TLP is most often the best solution.