Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.
Citation: Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.

A Set of MapReduce Tuning Experiments Based on Meteorological Operations

  • Received Date: 2013-10-08
  • Rev Recd Date: 2014-10-08
  • Publish Date: 2014-09-30
  • Cloud computing technologies, which solves the problem of low computing power of a standalone server, uses distributed computing technology to achieve the computing power of parallel computing and computational efficiency. Cloud computing is a new application model for decentralized computing which can provide reliable, customized and maximum number of users with minimum resource, and it is also an important way to carry out cloud computing theory research and practical application combining with other theory and good techniques. In many industries and fields, cloud computing has a wider range of applications, and its flexibility, ease of use, stability is gradually affirmed. In meteorological department, cloud-based platform for the development of scientific computing is still very limited, but some attempts are implemented with the maturation of cloud computing.In meteorological operations, such as large-scale scientific computing and other general computing model are run on high-performance server clusters. Due to limitations of resources and the number of HPC nodes, scientific computing still relies on traditional standalone or clustered mode. Therefore, an internal exploration and conventional general-purpose computing and cloud computing platform is very meaningful for the meteorological department. 60-year valuable and precious long sequence of historical data are stored in National Meteorological Information Center for the use of real-time, near-real-time business and research. Processing these historical data is time-consuming, therefore some new methods are implemented. Based on Hadoop cloud computing platform, a cluster mode is built and a variety of statistical methods are adopted using MapReduce computation model. The storage format of the source data is adjusted with SequenceFile which is composed of < Key, Value > serialization, by this mean multiple files of Format-A are merged to a large SequenceFile to test computational efficiency changes. Meanwhile, many small files are merged to a larger file. Configurations are modified experimentally for the Hadoop cluster environment, and different number of task nodes are used to record different computational efficiency.
  • Fig. 1  Task scheduling model on cloud computing platform

    Fig. 2  MapReduce computation processes

    Fig. 3  Calculation flow chart of traditional software

    Fig. 4  Data flow diagram of MapReduce computation model

    Fig. 5  Reorganization flowchart on cloud computing platforms

    Fig. 6  Experiment on time change with node numbers

    Fig. 7  Experiment on time change with system parameters of cloud computing platform

    Fig. 8  Experiment on time change with max parallel tasks of cloud computing platform

    (a) uniform ordinate, (b) index ordinate

    Fig. 9  Experiment on time change with upload files to HDFS

    Table  1  Configuration of host machine on physical cloud platform

    序号 操作系统 CPU核数 内存 网络
    1 SUSE 10 (x86_64) 16核 (2.27 GHz) 16 GB 千兆
    2 SUSE 10 (x86_64) 16核 (2.27 GHz) 16 GB 千兆
    3 SUSE 11 (x86_64) 8核 (2.0 GHz) 16 GB 千兆
    4 Redhat 6.3 Beta 8核 (2.0 GHz) 16 GB 千兆
    5 Redhat 6.3 Beta 8核 (2.0 GHz) 16 GB 千兆
    DownLoad: Download CSV

    Table  2  Experiment results of different storage structures and data file size (unit:s)

    数据存储结构 5节点 6节点 7节点 8节点 9节点 10节点
    原始文件 36720 31310 26279 22285 19494 17817
    10个文件合并 3945 3170 2663 2278 2006 1805
    100个文件合并 535 442 389 342 316 312
    Sequencefile方式 166 158 123 110 107 94
    DownLoad: Download CSV
  • [1]
    郎为民, 杨德鹏, 李虎生.中国云计算发展现状研究.电信快报, 2011, 10:1-6. http://www.cnki.com.cn/Article/CJFDTOTAL-DXKB201110001.htm
    [2]
    李德毅.2011云计算技术发展报告.北京:科学出版社, 2011, 5:1-10.
    [3]
    [4]
    张诚忠. 广东借助云计算破预报瓶颈天气分辨率升至3公里. [2011-12-11]. http://news.xinhuanet.com/2011-12/11/c_111234079.htm.
    [5]
    沈文海.从云计算看气象部门未来的信息化趋势.气象科技进展, 2012, 1(2):49-56. http://www.cnki.com.cn/Article/CJFDTOTAL-QXKZ201202017.htm
    [6]
    沈文海. 云计算受困于服务手段的有限和体制两因素. [2012-12-15]. http://cio.itxinwen.com/Online/2011/1115/370736.html.
    [7]
    刘小宁, 张洪政, 李庆祥.不同方法计算的气温平均值差异分析.应用气象学报, 2005, 16(3):345-356. doi:  10.11898/1001-7313.20050309
    [8]
    王炳忠, 申彦波.我国上空的水汽含量及其气候学估算.应用气象学报, 2012, 23(6):763-768. doi:  10.11898/1001-7313.20120614
    [9]
    张强, 熊安元, 张金艳, 等.晴雨 (雪) 和气温预报评分方法的初步研究.应用气象学报, 2009, 20(6):692-698. doi:  10.11898/1001-7313.20090606
    [10]
    张顺谦, 马振峰, 张玉芳.四川省潜在蒸散量估算模型.应用气象学报, 2009, 20(6):729-736. doi:  10.11898/1001-7313.20090611
    [11]
    刘娜.基于MapReduce的数据挖掘算法在全国人口系统中的应用.北京:首都经济贸易大学, 2011:20-43.
    [12]
    李军华.云计算及若干数据挖掘算法的MapReduce化研究.成都:电子科技大学, 2010:19-32.
    [13]
    贾雄.数值天气预报云计算环境关键技术研究与实现.长沙:国防科学技术大学, 2011:2-33.
    [14]
    万至臻.基于MapReduce模型的并行计算平台的设计与实现.杭州:浙江大学, 2008:17-21.
    [15]
    朱珠.基于Hadoop的海量数据处理模型研究和应用.北京:北京邮电大学, 2008:7-20.
    [16]
    吴朱华.云计算核心技术剖析.北京:人民邮电出版社, 2011:16-44.
    [17]
    周敏奇, 王晓玲, 金澈清, 等.Hadoop权威指南 (第2版).北京:清华大学出版社, 2011:213-224.
    [18]
    金之雁, 颜宏.数值天气预报并行计算模式的设计与可行性讨论.应用气象学报, 1993, 4(1):117-121. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=19930122&flag=1
    [19]
    牟道楠, 王宗皓.层次分解并行计算法在TOVS资料中尺度分析中的应用.应用气象学报, 1994, 5(1):77-81. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=19940113&flag=1
  • 加载中
  • -->

Catalog

    Figures(9)  / Tables(2)

    Article views (2574) PDF downloads(785) Cited by()
    • Received : 2013-10-08
    • Accepted : 2014-10-08
    • Published : 2014-09-30

    /

    DownLoad:  Full-Size Img  PowerPoint