2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Full text: Unavailable
Parallel debugging faces challenges in both scalability and efficiency. A number of advanced methods have been invented to improve the efficiency of parallel debugging. As the scale of system increases, these methods highly rely on a scalable communication protocol in order to be utilized in large-scale distributed environments. This paper describes a debugging middleware that provides fundamental debugging functions supporting multiple communication protocols. Its pluggable architecture allows users to select proper communication protocols as plug-ins for debugging on different platforms. It aims to be utilized by various advanced debugging technologies across different computing platforms. The performance of this debugging middleware is examined on a Cray XE Supercomputer with 21,760 CPU cores.