Li Yongsheng, Zeng Qin, Xu Meihong, et al. Design and implementation of NWP data service platform based on hadoop framework. J Appl Meteor Sci, 2015, 26(1): 122-128. DOI:  10.11898/1001-7313.20150113.
Citation: Li Yongsheng, Zeng Qin, Xu Meihong, et al. Design and implementation of NWP data service platform based on hadoop framework. J Appl Meteor Sci, 2015, 26(1): 122-128. DOI:  10.11898/1001-7313.20150113.

Design and Implementation of NWP Data Service Platform Based on Hadoop Framework

DOI: 10.11898/1001-7313.20150113
  • Received Date: 2014-05-19
  • Rev Recd Date: 2016-09-28
  • Publish Date: 2015-01-31
  • As the numerical weather prediction (NWP) products increase in huge amounts every day, traditional relational database has the problem of low efficiency in archiving capacity and management, while file based storage faces performance challenges in long-time-series data accessing and massive computation of spatial-temporal data. Therefore, a three-tier software framework is designed, which implements distributed data storage model, parallel data access service and distributed computation for frequently used statistical algorithms based on Hadoop framework. Meteorological big data such as NWP products, radar 3D mosaic and satellite remote sensing are designed to be composed of metadata and data entity, which both are stored in Hbase data tables, and managed with HDFS file system. Metadata are defined by variable name, dimension, latitude, longitude, altitude and lead time etc., and data entity consists of row key, time stamp and column family to store the value at each grid point. A Rest (representational state transfer) Web Service is setup for direct NWP data acquisition, field data clipping and location based time-series accessing. File download services in "MICAPS", "surfer" and "json" format are also ready for the third-party meteorological software. System testing for data access of CHAF model shows that it costs only 12 seconds to write in 1000 NWP data fields each with 82503 grid points, and less than 4 seconds to read out the same amount of data from the distributed databases.Map-reduce scheme are implemented for computation of meteorological algorithms, e.g., Kalman filter and successive regression. Most of meteorological statistical algorithms are time independent, which make it possible that a task is divided into small sub-tasks according to data slicing on time series, and assigned to different computational nodes in map programs. Reduce programs are to gather and summarize the result of sub-task computation. With data amount and users increasing, Hadoop framework deployed on several X86 PC servers demonstrates performance advantage over single IBM power system. And flexible hardware architecture from 3 computational nodes to 9 nodes show steady and better data access efficiency with good speed-up ratio, which brings more confidence for practical use in weather forecast.Operational trial in multi-user environment further shows advantages of this cloud-like computing service over the traditional client-server model in meteorological data mining, such as NWP interpretation and model evaluation.
  • Fig. 1  The diagram of the system overall frame

    Fig. 2  The system function structure diagram

    Fig. 3  Results of access performance of data

    Fig. 4  Results of reading performance of data interface

    Fig. 5  Results of performance of platform extension

    Table  1  The description of metadata table content

    存储列名 含义说明
    meta:variables 当前产品元数据变量,描述了当前元数据表中的数据包含哪些变量,
    如经度、纬度、高度、日期、状态、要素、起报时间等
    meta:dimensions 描述变量的维度信息,如有56个经向维度值、68个纬向维度值等
    meta:lat 当前产品维向维度数据,描述当前产品的所有纬度值
    meta:level 当前产品预报层数基本信息,如包含10个预报层次数
    meta:lon 当前产品经向维度基本信息
    meta:time 当前产品时间维度的基本信息,如包含62个时间维度
    DownLoad: Download CSV

    Table  2  The schematics table of entity data model

    行键 时间戳 列族
    列1(如500 hPa) 列2(如700 hPa) 列N (如850 hPa)
    AAAATTT变量:起报时间 t1 数据1 数据4 数据7
    AAAATTT变量:起报时间 t2 数据2 数据5 数据8
    AAAATTT变量:起报时间 tN 数据3 数据6 数据9
    DownLoad: Download CSV
  • [1]
    李集明, 沈文海, 王国复.气象信息共享平台及其关键技术研究.应用气象学报, 2006, 17(5):621-622. doi:  10.11898/1001-7313.20060505
    [2]
    王伯民, 李集明, 吴增祥.我国气象科学数据发布策略初步研究.应用气象学报, 2004, 15(增刊Ⅰ):153-154. http://www.cnki.com.cn/Article/CJFDTOTAL-YYQX2004S1021.htm
    [3]
    宋连春, 肖风劲, 李威.我国现代化气候业务现状及未来发展趋势.应用气象学报, 2013, 24(5):513-520. doi:  10.11898/1001-7313.20130501
    [4]
    Randriamampianina R.Radiance-bias correction for a limited area model.Quarterly Journal of the Hungarian Meteorological Service, 2005(3):636. https://www.researchgate.net/profile/Roger_Randriamampianina/publication/238725456_Choice_for_radiance-bias_correction_for_a_limited-area_model/links/0046352ef76e2eb10f000000.pdf?origin=publication_detail
    [5]
    王萍, 刘颖, 王汉芝, 等.基于格点场数据的沙尘暴双预报模型.天津大学学报, 2006, 3(25):329-330. http://www.cnki.com.cn/Article/CJFDTOTAL-TJDX200603015.htm
    [6]
    吴焕萍, 张永强, 孙家民, 等.气候信息交互显示与分析平台 (CIPAS) 设计与实现.应用气象学报, 2013, 24(5):631-640. doi:  10.11898/1001-7313.20130513
    [7]
    王涵, 李玲, 孙学庆, 等.CMACast内蒙古气象信息Web平台的设计与实现.气象科技, 2014, 42(2):283-286. http://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201402019.htm
    [8]
    Aguilera M K, Merchant A, Shah M, et al.Sinfonia:A New Paradigm for Building Scalable Distributed Systems.Proc of the 21st ACM Symp on Operating Systems Princi-ples, 2007:159-174. http://www.cs.nyu.edu/srg/talks/sinfonia.ppt
    [9]
    杨锋, 吴华瑞, 朱华瑞, 等.基于Hadoop的海量农业数据资源管理平台.计算机工程, 2011, 37(12):243-244. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201112083.htm
    [10]
    林春泽, 智协飞, 韩艳, 等.基于TIGGE资料的地面气温多模式超级集合预报.应用气象学报, 2009, 20(6):706-712. doi:  10.11898/1001-7313.20090608
    [11]
    王彬, 肖文名, 李永生, 等.华南区域中心计算资源管理系统的建立与应用.气象, 2011, 37(6):764-770. doi:  10.7519/j.issn.1000-0526.2011.06.016
    [12]
    王彬, 常飚, 朱江, 等.气象计算网格平台资源监视模块的设计与实现.应用气象学报, 2009, 20(5):642-648. doi:  10.11898/1001-7313.20090517
    [13]
    于重重, 商利利, 谭励, 等.半监督学习在不平衡样本集分类中的应用研究.计算机应用研究, 2013, 30(4):1085-1089. http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ201304035.htm
    [14]
    Ma T, Hempel M, Peng D M, et al.A survey of energy-efficient compression and communication techniques for multimedia in resource constrained systems.IEEE Communications Surveys & Tutorials, 2012, 14:1-10. https://www.researchgate.net/publication/299457520_An_energy-efficient_compression_scheme_for_wireless_multimedia_sensor_networks
    [15]
    应毅, 任凯, 曹阳.基于改进的MapReduce模型的Web挖掘.科学技术与工程, 2013, 5:78-80. http://www.cnki.com.cn/Article/CJFDTOTAL-KXJS201305020.htm
    [16]
    任结, 周余, 于耀, 等.基于ORB自然特征的AR实时系统实现.计算机应用研究, 2012, 29(9):3594-3596. http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ201209108.htm
  • 加载中
  • -->

Catalog

    Figures(5)  / Tables(2)

    Article views (4676) PDF downloads(954) Cited by()
    • Received : 2014-05-19
    • Accepted : 2016-09-28
    • Published : 2015-01-31

    /

    DownLoad:  Full-Size Img  PowerPoint