Design and Implementation of Surface Meteorological Data Statistical Processing System
-
摘要: 为实现6万余个地面气象站资料实时统计处理,设计并实现了地面气象资料统计处理系统。系统采用灵活易扩展的技术框架,应用大数据分布式流处理技术实现高效的数据处理,较传统统计处理框架统计时效提升10掊以上,主要功能包括基于任务调度的定时统计、针对迟到数据和数据更正信息的统计结果滚动更新、自定义统计、统计处理通用算法服务等,采用国家级一级部署、各省同步应用的业务布局保证数据的一致性。2017年1月投入业务运行后,实时计算800余个多种尺度的统计项并通过全国综合气象信息共享平台(CIMISS)数据统一服务接口提供统计产品服务。分析表明:地面气象资料统计产品2017年月平均下载次数达到1951.4万次,在CIMISS所有400余种观测资料或产品中排名第三,为天气监测预报预警、气象决策服务、气候监测业务、公共气象服务等提供重要的基础数据支撑。Abstract: Statistical products of surface meteorological data (SMD) are among the most-frequently-used data in meteorological research and operations. As the improvement of surface meteorological observation system over China, statistics of SMD have encountered problems such as large number of sites, wide variety of elements, and complexity of statistical strategy. With typical features of big data, it's possible for SMD to serve more precise and efficient operations nowadays, which is obviously beyond the capability of traditional serial processing framework.Aiming at precise and efficient statistic processing of data from more than 60000 surface weather stations, a statistical processing system for SMD is built based on big data technology. Compared to traditional serial processing framework, efficiency of the system has increased by more than 10 times and more statistics and function are provided, such as fast calculation, rolling update of statistical values according to late-arriving data and corrected information, and arbitrary time scale statistics. Storm distributed flow processing technology is applied in the system to realize efficient statistical calculations. Big data message transmission and cache technology are also applied to ensure the system's high efficiency and stability. Modular design framework ensures strong extensibility of the system, based on which statistics, quality control and evaluation algorithms are extended to varieties of data, e.g., upper-air, radiation, oceanic and aircraft measurements. The system is deployed at national meteorology department and its products are synchronously applied at the provincial level, for this layout ensures data consistency.The system is incorporated into China Integrated Meteorological Information Sharing System (CIMISS) and become its core data processing framework. The system provides more than 800 real-time multi-scale SMD statistical values to serve meteorological users and the public through CIMISS data unified service interface since January 2017. Based on data access logs, monthly access of daily SMD statistics reach 19.51 million times in 2017, ranking the 3th among over 400 data or products, playing important roles in weather monitoring, forecasting and warning, meteorological decision, public service and climate research.In the future, the technical framework and algorithm module of the system will be integrated into the processing pipeline of meteorological large data cloud platform, with further optimization of the computational topology for full use of computing resources, which can increase convergence time for distributed node processing results. To further improve the efficiency of statistical processing, the launching mechanism of this operation can be changed from periodic to automatic scheduling based on the trigger of observed data integrity.
-
表 1 统计处理数据量分析
Table 1 Data amount used for target statistics
统计尺度 数据源 记录数/104 数据量/GB 日 小时观测数据 144 12 候 日统计值小时观测数据 750 64 旬 日统计值小时观测数据 1500 128 月 日统计值小时观测数据 4500 300 季 日统计值月统计值 4800 500 年 月统计值分钟观测数据 130000 3600 表 2 定时启动时间及产品服务时间
Table 2 Timing start time and product service timeliness
统计产品(按统计尺度划分) 定时启动时间 产品服务时间 日 每日21:00, 第2日08:40(更新天气现象统计) 21:15, 08:55 候 每月1日、6日、11日、16日、21日、26日09:00 09:10 旬 每月1日、11日、21日09:10 09:20 月 每月1日09:20 10:00 季 每年的3月、6月、9月、12月的1日10:00 11:00 年 每年1月1日11:00 13:00 注:表中时间均为北京时,下同;产品服务时间定义为该统计产品可提供用户检索的时间;天气现象要素统计的数据源包括人工审定后的地面气象观测日数据文件,因此,日统计项中天气现象要素统计计算在第2日收到日数据文件后定时启动。 表 3 国家级业务系统下载地面气象资料统计产品情况(前10名)
Table 3 Download times of surface meteorological data statistics in national operational systems(Top 10)
序号 业务系统名称 年下载量/GB 年下载次数 1 中国天气网 1378.5 34412772 2 中国气象局公共气象服务中心一体化加工平台 728.3 146352 3 农业气象业务系统(CAgMSS) 288.6 373152 4 预警信息发布数据支撑系统 231.8 203982 5 北京市空气质量预报预警平台 138.4 23326 6 气象服务信息系统(MESIS) 47.2 8760 7 气候信息处理与分析系统(CIPAS) 25.3 723261 8 中国气象数据网 24.8 164611 9 气象灾害信息管理系统 20.3 99759 10 中国兴农网 16.2 1528 -
[1] 刘莹, 刘雯, 闫荞荞, 等.气象资料业务系统(MDOS)使用手册.北京:气象出版社, 2017. [2] 吕终亮, 罗兵, 吴焕萍, 等.MESIS信息检索及可视化产品制作平台实现.应用气象学报, 2012, 23(5):631-637. doi: 10.3969/j.issn.1001-7313.2012.05.015 [3] 韩宁, 苗春生.近6年陕甘宁三省5-9月短时强降水统计特征.应用气象学报, 2012, 23(6):691-701. doi: 10.3969/j.issn.1001-7313.2012.06.006 [4] 熊安元, 赵芳, 王颖, 等.全国综合气象信息共享系统的设计与实现.应用气象学报, 2015, 26(4):500-512. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20150412&flag=1 [5] 赵芳, 熊安元, 张小缨, 等.全国综合气象信息共享平台架构设计技术特征.应用气象学报, 2017, 28(6):750-758. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20170610&flag=1 [6] Kirk K.Java应用架构设计:模块化模式与OSGi.北京:机械工业出版社, 2013. [7] 李国杰, 程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考.中国科学院院刊, 2012, 27(6):647-657. doi: 10.3969/j.issn.1000-3045.2012.06.001 [8] 罗敬宁, 刘立葳.遥感大数据分布式技术研究与实现.应用气象学报, 2017, 28(5):621-631. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20170510&flag=1 [9] 岳兆娟, 史劼, 李斌, 等.一种基于SOA的可信软件体系架构模型.中国电子科学研究院学报, 2016, 11(3):315-318. doi: 10.3969/j.issn.1673-5692.2016.03.018 [10] 卡普廖洛, 万普勒, 卢森格林, 等.Hive编程指南.北京:人民邮电出版社, 2013. [11] Thomas E.SOA服务设计原则.北京:人民邮电出版社, 2009. [12] Goetz P T, O'Neill B.Storm分布式实时计算模式.北京:机械工业出版社, 2015. [13] 王慕华, 唐卫, 丰德恩, 等.基于消息驱动的气象图形产品加工系统.气象, 2018, 44(5):692-698. http://d.old.wanfangdata.com.cn/Periodical/qx201805010 [14] 孙大为, 张广艳, 郑纬民.大数据流式计算:关键技术及系统实例.软件学报, 2014, 25(4):839-862. http://d.old.wanfangdata.com.cn/Periodical/rjxb201404011 [15] 杨润芝, 沈文海, 肖卫青.基于MapReduce计算模型的气象资料处理调优试验.应用气象学报, 2014, 25(5):618-628. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20140511&flag=1 [16] 任芝花, 张志富, 孙超, 等.全国自动气象站实时观测资料三级质量控制系统研制.气象, 2015, 41(10):1268-1277. doi: 10.7519/j.issn.1000-0526.2015.10.010 [17] 王妍, 孙超, 余予, 等.全国自动气象站QC信息管理系统的设计与实现.安徽农业科学, 2015, 43(30):178-181. doi: 10.3969/j.issn.0517-6611.2015.30.066 [18] 刘峰, 鄂海红.基于海量数据的消息队列的性能对比与优化方案.软件, 2016, 37(9):33-37. doi: 10.3969/j.issn.1000-386x.2016.09.008 [19] 王甫棣, 姜立鹏, 姚燕.北京全球信息系统中心的数据缓存功能优化.应用气象学报, 2014, 25(2):242-248. doi: 10.3969/j.issn.1001-7313.2014.02.014 [20] 王春芳, 李湘, 陈永涛, 等.中国气象局卫星广播系统(CMACast)设计.应用气象学报, 2012, 23(1):113-120. doi: 10.3969/j.issn.1001-7313.2012.01.013