Guo Miao, Jin Zhiyan, Zhou Bin. GPGPU accelerated massive parallel design of long wave radiation process in GRAPES-global model. J Appl Meteor Sci, 2012, 23(3): 348-354.
Citation: Guo Miao, Jin Zhiyan, Zhou Bin. GPGPU accelerated massive parallel design of long wave radiation process in GRAPES-global model. J Appl Meteor Sci, 2012, 23(3): 348-354.

GPGPU Accelerated Massive Parallel Design of Long Wave Radiation Process in GRAPES-Global Model

  • Received Date: 2011-07-05
  • Rev Recd Date: 2012-03-22
  • Publish Date: 2012-06-30
  • In recent years, with the rapid advance of GPGPU (General Purpose Graphic Processing Unit) technology, leveraging the massive parallel processing power of GPGPU to provide super-computing capacity becomes a new trend. At present, GPGPU has been applied to scientific calculations of many fields. GRAPES (Global/Regional Assimilation and PrEdictions System) is the new-generation multi-scale numerical model, which is developed by Chinese Academy of Meteorological Sciences and plays an important role in weather forecasting and research. Long wave radiation process is one of the most important physical processes in GRAPES_Global model and occupies a lot of processing time, affecting the whole model's computing efficiency. Since this process could be partitioned into different tiles within the horizontal plane, a naturally parallel scheme could be carried out.A GPU has hundreds of stream processors within one chip, which enables it to handle thousands of hardware threads simultaneously, and gives much higher theoretical throughput: Over 1 TFlops by one chip. GPU also has a whole integration of supporting tool sets, from compiler to libraries, which could facilitate the development. Considering the characteristics of the long wave radiation computing process, keeping the high level MPI communication the same, a low-level fine-grained parallel architecture is designed to harness the computing power of the new hardware. This massive parallel processing implementation is based on NVIDIA GPGPU and CUDA technology. Other than looping through a big portion of the atmosphere columns within conventional CPU-based systems, the new GPU-based implementation uses each small core to process a single column. This scheme has three major advantages, including much higher thread concurrence, using bigger band width of GPU memory, denser computing intensity and better efficiency. Experiments with real dataset are performed and the correctness of the new design is validated, which show that Tesla C1060 has an 11x speedup compared to a high-end x86 CPU, greatly improving the execution speed and forecast efficiency. Timing on sub-routines and data transfer time are also recorded and compared. Different partition configurations are carried out to get the best combination. Also, the overlapping of execution and data transfer is used to hide the latency. The experiment shows GPGPU has good potential to improve numerical weather forecasting models. With more and more routines ported to GPU systems, a much better speedup could be achieved over the whole model.
  • Fig. 1  The architecture of GPGPU parallel system

    Fig. 2  CPU coarse-grained parallelism (a) and GPU fine-grained parallelism (b)

    (t0, …, tm denote the number of thread)

    Fig. 3  The execute model of the long-wave radiation scheme

    Fig. 4  The flow chart of the long-wave radiation scheme

    Fig. 5  The comparison of computational speed between CPU and GPGPU with the increase number of columns

    Fig. 6  The comparison of the long-wave radiation fluxes results between CPU (a) and GPGPU (b)

    Table  1  The comparison of parallel results

    核函数 并行计算气柱
    用时/ms
    并行计算空间点
    用时/ms
    inatm_d () 17.41 2.16
    cldprmc_d () 16.21 6.63
    setcoef_d () 2.65 0.85
    taumol_d () 23.06 3.68
    rtrnmc_d () 37.14 37.14
    DownLoad: Download CSV
  • [1]
    伍湘君, 金之雁, 黄丽萍, 等.GRAPES模式软件框架与实现.应用气象学报, 2005, 16(4):540-546. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20050468&flag=1
    [2]
    陈德辉, 沈学顺.新一代数值预报系统GRAPES研究进展.应用气象学报, 2006, 17(6):773-777. doi:  10.11898/1001-7313.20060614
    [3]
    Xia Y, Kaufmann H. and Guo X F. 2002. Differential SAR Interferometry Using Corner Reflectors//IEEE 2002 International Geoscience and Remote Sensing Symposium, Washington, USA:IEEE computer society, 1243-1246. http://www.realworldtech.com/page.cfm?ArticleID=RWT090808195242.
    [4]
    Michalakes J, Hacker J, Loft R, et al. WRF Nature Run//Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 2007:1-6.
    [5]
    Michalakes John, Vachharajani Manish. GPGPU Acceleration of Numerical Weather Prediction. [2011-06-12]. http://cuda.csdn.net/showcase.html.
    [6]
    Govett Mark. Using GPUs to Run Weather Prediction Models. 14th ECMWF Workshop on High Performance Computing in Meteorology, 2010.
    [7]
    Henderson Tom. Progress on GPGPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core. 14th ECMWF Workshop on High Performance Computing in Meteorology, 2010.
    [8]
    Ruetsch Greg, Phillips Everett, Massimiliano Fattca. GPGPU Acceleration of the Long-wave Rapid Radiative Transfer Model in WRF Using CUDA Fortran. [2011-06-09]. http://www.pgroup.com/resources/accel_ files/ index.htm.
    [9]
    NVIDIA.CUDA_C_Programming_Guide.[2010-6-15]. http://developer.nvidia.com/cuda-toolkit-40.
    [10]
    [11]
    The Portland Group. CUDA FORTRAN Programming Guide and Reference. [2011-06-21]. http://www.pgroup.com/resources/cudafortran.htm.
    [12]
    黄丽萍, 伍湘君, 金之雁.GRAPES模式标准初始化方案设计实现.应用气象学报, 2005, 16(3):374-383. doi:  10.11898/1001-7313.20050312
    [13]
    金之雁, 王鼎兴.大规模数据并行问题的可扩展性分析.应用气象学报, 2003, 14(3):369-374. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20030345&flag=1
    [14]
    朱政惠, 施培量.用OpenMP并行化气象预报模式试验.应用气象学报, 2002, 13(1):102-108. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20020112&flag=1
    [15]
    杨学胜, 伍湘君, 金之雁.我国新一代全球有限区通用数值预报模式GRAPES的并行计算设计与实现.高性能计算发展与应用, 2007, 44(3):510-515. http://cdmd.cnki.com.cn/Article/CDMD-90002-1011303256.htm
    [16]
    杨学胜, 沈元芳, 徐国强.辐射方案对GRAPES全球模式的影响.大气科学, 2009, 33(3):593-595. http://www.cnki.com.cn/Article/CJFDTOTAL-DQXK200903015.htm
  • 加载中
  • -->

Catalog

    Figures(6)  / Tables(1)

    Article views (3480) PDF downloads(1230) Cited by()
    • Received : 2011-07-05
    • Accepted : 2012-03-22
    • Published : 2012-06-30

    /

    DownLoad:  Full-Size Img  PowerPoint