超级计算机上矩阵乘的并行计算与实现

IMPLEMENTATION OF MATRICES-MULTIPLICATION ON SUPERCOMPUTER

摘要: 数值预报系统中经常要用到矩阵乘运算。在分布式超级计算机 (如IBM-SP) 上, 矩阵乘的并行计算需要较多的数据移动, 有效的数据传输对矩阵乘的实现至关重要。该文讨论了两种矩阵乘的并行算法, 一种是基于矩阵的列-行划分方式, 一种是基于矩阵的网格划分方式。在IBM-SP计算机上的实验结果表明, 网格划分的矩阵乘并行算法通讯开销更小, 并行效率更高, 其并行加速比较列-行并行算法改善约10 %。

Abstract: The matrices multiplication is often used in NWP. On distributed systems, such as IBM-SP, the multiplication of two matrices requires data transpose and the efficient data communication are crucial to its performance. Two parallel algorithms are presented, one is based on column-row decomposition and another is based on mesh partition, and the implementation and communication-time of this two different methods are discussed. Results on IBM-SP show that the communication in mesh algorithm are less and the improvement on speedup is up to 10%.