The Design and Implementation of Stream Processing for Data of Ground Automatic Weather Stations
-
摘要: 针对观测密度和频次日益增加的海量地面自动气象站数据, 在气象大数据云平台(天擎)中设计了基于Storm的实时流式处理, 利用大规模并行处理的优势提高地面自动气象站数据的处理时效。在流式处理中, 设计处理拓扑直接解码标准格式的数据消息;消息确认采用手工确认的方式, 将数据解码组件锚定数据接入组件, 实现每条数据的可靠处理;数据解码时进行字节校验和时间检查等, 过滤异常数据;应用批量加定时的发送策略, 解决海量监控信息发送气象综合业务实时监控系统(天镜)的问题;集群部署时保留部分剩余资源, 有效应对单节点异常。应用效果表明:国家气象站小时数据的服务时效由全国综合气象信息共享系统(CIMISS)的175 s提高至天擎的78 s, 约6×104个区域气象站小时数据的服务时效由CIMISS的5 min提高至天警的2 min, 实况分析系统将数据源切换至天擎后, 相同时间检索可获取的站点数量较CIMISS增加1倍。2021年12月基于Storm的流式处理与天擎一同在国省业务化运行, 实现了长期稳定运行, 为MICAPS4、SWAN2.0、实况分析系统等用户提供高效稳定的地面自动气象站数据。Abstract: To process the high-density and high-frequency mass data generated by ground automatic weather stations, a real-time stream processing system based on Storm is designed and implemented in the Meteorological Big Data Cloud Platform (Tianqing). It leverages the advantages of large-scale parallel computing to enhance processing speed. For BUFR messages, a Storm topology is designed to process the standardized BUFR format data transmitted by RabbitMQ directly on the service, reducing the intermediate steps from transmission to processing of observations. In the spout design, the manual confirmation mode of RabbitMQ messages is adopted to ensure that each message is effectively processed. In the decoding process, bolt is anchored to the spout using message identification (ID) to ensure reliable processing of each message. Format and time checks are performed during data decoding to filter out abnormal data. A batch timing monitoring strategy is applied to address the issue of data ingestion loss caused by port occupancy during extensive monitoring data transmission. A startup strategy with a configurable number of spout and bolt is designed for quick optimization and adjustment based on system resources. During cluster deployment, some resources are reserved to enable automatic task migration without disrupting business operations in case of node corruption within the cluster. System design involves automatically reconnecting message queues and databases to enhance system stability and enable self-healing capabilities. Application results show that the service efficiency of 2442 national stations has decreased from 175 s with CIMISS to 78 s with Tianqing. Additionally, the service efficiency of hourly data from over 60000 regional stations has decreased from 5 min with CIMISS to 2 min with Tianqing. After switching the data source of the ART (analysis of real time) system to Tianqing, the number of stations that can be retrieved simultaneously is doubled compared to CIMISS. It can effectively improve the quality of ART live products while keeping other conditions unchanged. By implementing specialized stream processing, it can effectively handle various business scenarios where data access process of the provincial Tianqing ground automatic weather stations differ from that of other provinces. It enables the provincial Tianqing to quickly process nationwide data from ground automatic weather stations. In December 2021, Storm-based stream processing is implemented in the national and provincial meteorological information departments alongside Tianqing. It has been running smoothly over two years, delivering reliable ground automatic weather station data to users, including MICAPS4, SWAN2.0, ART systems and others.
-
表 1 地面自动气象站数据入库数据表说明
Table 1 Description of data insert tables for ground automatic weather station data
数据名称 天擎入库表名 国家气象站小时数据 中国地面小时原始报告表 中国地面小时表 全球地面小时表 中国地面日值表 中国地面日照表 重要天气表 国家气象站分钟数据 中国地面分钟原始报告表 中国地面分钟降水表 地面分钟全要素表 区域气象站小时数据 中国地面小时原始报告表 中国地面小时表 区域气象站分钟数据 中国地面分钟原始报告表 中国地面分钟降水表 地面分钟全要素表 表 1 地面自动气象站数据入库数据表说明
Table 1 Description of data insert tables for ground automatic weather station data
数据名称 天擎入库表名 国家气象站小时数据 中国地面小时原始报告表 中国地面小时表 全球地面小时表 中国地面日值表 中国地面日照表 重要天气表 国家气象站分钟数据 中国地面分钟原始报告表 中国地面分钟降水表 地面分钟全要素表 区域气象站小时数据 中国地面小时原始报告表 中国地面小时表 区域气象站分钟数据 中国地面分钟原始报告表 中国地面分钟降水表 地面分钟全要素表 表 2 国家级天擎地面自动气象站数据处理拓扑说明
Table 2 Topology description of national Tianqing ground automatic weather station processing
数据类型 每小时上传数据次数 工作进程数量 数据接入组件数量 解码入库组件数量 国家气象站小时数据 1 12 12 36 国家气象站分钟数据 60 6 6 36 区域气象站小时数据 1 24 24 240 区域气象站分钟数据 12 24 20 300 表 2 国家级天擎地面自动气象站数据处理拓扑说明
Table 2 Topology description of national Tianqing ground automatic weather station processing
数据类型 每小时上传数据次数 工作进程数量 数据接入组件数量 解码入库组件数量 国家气象站小时数据 1 12 12 36 国家气象站分钟数据 60 6 6 36 区域气象站小时数据 1 24 24 240 区域气象站分钟数据 12 24 20 300 表 3 2023年3月中国气象局各直属单位通过业务系统获取地面自动气象站小时数据情况汇总
Table 3 Summary of ground automatic weather stations hourly data from systems by various institutions of China Meteorological Administration in Mar 2023
单位 系统名称 访问次数 数据量/GB 国家气象信息中心 天擎实况 17844182 22739.5 中国气象科学研究院 东亚区域再分析及智能预报竞赛系统 753707 9400.8 国家气象中心 智能网格预报处理系统 603944 3305.1 气象探测中心 综合气象观测数据质量控制系统_天衡天衍 372840 350.6 气象干部培训学院 短临预警技能与素质综合训练系统 247207 1580.8 人工影响天气中心 人影效果评估系统 113491 1695.5 国家气候中心 气候变化影响评估与服务系统 42591 8.5 国家卫星气象中心 卫星天气应用平台(SWAP) 22425 54.4 公共气象服务中心 国家级交通气象服务业务 13647 229.5 地球系统数值预报中心 GRAPES数值预报业务系统 5517 44.8 表 3 2023年3月中国气象局各直属单位通过业务系统获取地面自动气象站小时数据情况汇总
Table 3 Summary of ground automatic weather stations hourly data from systems by various institutions of China Meteorological Administration in Mar 2023
单位 系统名称 访问次数 数据量/GB 国家气象信息中心 天擎实况 17844182 22739.5 中国气象科学研究院 东亚区域再分析及智能预报竞赛系统 753707 9400.8 国家气象中心 智能网格预报处理系统 603944 3305.1 气象探测中心 综合气象观测数据质量控制系统_天衡天衍 372840 350.6 气象干部培训学院 短临预警技能与素质综合训练系统 247207 1580.8 人工影响天气中心 人影效果评估系统 113491 1695.5 国家气候中心 气候变化影响评估与服务系统 42591 8.5 国家卫星气象中心 卫星天气应用平台(SWAP) 22425 54.4 公共气象服务中心 国家级交通气象服务业务 13647 229.5 地球系统数值预报中心 GRAPES数值预报业务系统 5517 44.8 -
[1] 闫瑛.地面气象观测.北京:气象出版社, 2014.Yan Y. Ground Meteorological Observation. Beijing: China Meteorological Press, 2014. [2] 高嵩, 毕宝贵, 李月安, 等. MICAPS4预报业务系统建设进展与未来发展. 应用气象学报, 2017, 28(5): 513-531. doi: 10.11898/1001-7313.20170501Gao S, Bi B G, Li Y A, et al. Implementation and development plan of MICAPS4. J Appl Meteor Sci, 2017, 28(5): 513-531. doi: 10.11898/1001-7313.20170501 [3] 王若曈, 王建民, 黄向东, 等. MICAPS4服务端系统架构设计. 应用气象学报, 2018, 29(1): 1-12. doi: 10.11898/1001-7313.20180101Wang R T, Wang J M, Huang X D, et al. The architecture design of MICAPS4 server system. J Appl Meteor Sci, 2018, 29(1): 1-12. doi: 10.11898/1001-7313.20180101 [4] 黄丽萍, 邓莲堂, 王瑞春, 等. CMA-MESO关键技术集成及应用. 应用气象学报, 2022, 33(6): 641-654. doi: 10.11898/1001-7313.20220601Huang L P, Deng L T, Wang R C, et al. Key technologies of CMA-MESO and application to operational forecast. J Appl Meteor Sci, 2022, 33(6): 641-654. doi: 10.11898/1001-7313.20220601 [5] 韩丰, 沃伟峰. SWAN2.0系统的设计与实现. 应用气象学报, 2018, 29(1): 25-34. doi: 10.11898/1001-7313.20180103Han F, Wo W F. Design and Implementation of SWAN2.0 Platform. J Appl Meteor Sci, 2018, 29(1): 25-34. doi: 10.11898/1001-7313.20180103 [6] 刘海知, 徐辉, 包红军, 等. 机器学习分类算法在降雨型滑坡预报中的应用. 应用气象学报, 2022, 33(3): 282-292. doi: 10.11898/1001-7313.20220303Liu H Z, Xu H, Bao H J, et al. Application of machine learning classification algorithm to precipitation-induced landslides forecasting. J Appl Meteor Sci, 2022, 33(3): 282-292. doi: 10.11898/1001-7313.20220303 [7] Rostanski M, Grochla K, Seman A. Evaluation of highly available and fault-tolerant middleware clustered architectures using RabbitMQ. IEEE, 2014: 879-884. [8] 张来恩, 王鹏, 韩鑫强. CTS2.0消息封装及交换控制策略设计及实践. 气象科技进展, 2018, 8(1): 271-273. doi: 10.3969/j.issn.2095-1973.2018.01.054Zhang L E, Wang P, Han X Q. Design and practice of CTS2.0 message encapsulation and exchange control strategy. Adv Meteor Sci Tech, 2018, 8(1): 271-273. doi: 10.3969/j.issn.2095-1973.2018.01.054 [9] 胡英楣, 王甫棣, 谭小华, 等. 有状态消息队列在国内气象通信系统的应用. 计算机系统应用, 2020, 29(3): 121-126. https://www.cnki.com.cn/Article/CJFDTOTAL-XTYY202003017.htmHu Y M, Wang F D, Tan X H, et al. Application of stateful message queue technology in national meteorological communication system. Comput Syst Appl, 2020, 29(3): 121-126. https://www.cnki.com.cn/Article/CJFDTOTAL-XTYY202003017.htm [10] 邓鑫, 王祝先, 杨英奎, 等. 基于RabbitMQ技术的气象标准格式数据传输研究. 自动化技术与应用, 2021, 40(5): 182-185. https://www.cnki.com.cn/Article/CJFDTOTAL-ZDHJ202105043.htmDeng X, Wang Z X, Yang Y K, et al. Research on data transmission in meteorological standard format based on RabbitMQ technology. Tech Autom Appl, 2021, 40(5): 182-185. https://www.cnki.com.cn/Article/CJFDTOTAL-ZDHJ202105043.htm [11] 王颖, 薛蕾, 赵芳, 等. 气象数据格式标准化设计和实施进展. 气象科技进展, 2018, 8(1): 252-255. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKZ201801056.htmWang Y, Xue L, Zhao F, et al. Progress in standardization design and implementation of meteorological data format. Adv Meteor Sci Tech, 2018, 8(1): 252-255. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKZ201801056.htm [12] WMO. Manual on Codes(2019 Ed). 2021. [13] 王素娟, 崔鹏, 郑旭东, 等. 气象卫星风矢量BUFR编码. 气象科技, 2011, 39(3): 339-343. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201103014.htmWang S J, Cui P, Zheng X D, et al. Representing atmospheric motion vectors of meteorological satellites in BUFR. Meteor Sci Technol, 2011, 39(3): 339-343. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201103014.htm [14] 张恩红, 尹海燕. 基于压缩格式BUFR码的气象卫星数据解码解析. 广东气象, 2021, 43(5): 70-74. https://www.cnki.com.cn/Article/CJFDTOTAL-GDCX202105018.htmZhang E H, Yin H Y. Decoding analysis of meteorological satellite data based on compressed format BUFR code. Guangdong Meteor, 2021, 43(5): 70-74. https://www.cnki.com.cn/Article/CJFDTOTAL-GDCX202105018.htm [15] 熊安元, 赵芳, 王颖, 等. 全国综合气象信息共享系统的设计与实现. 应用气象学报, 2015, 26(4): 500-512. doi: 10.11898/1001-7313.20150412Xiong A Y, Zhao F, Wang Y, et al. Design and Implementation of China Integrated Meteorological Information Sharing System(CIMISS). J Appl Meteor Sci, 2015, 26(4): 500-512. doi: 10.11898/1001-7313.20150412 [16] 季永华, 孙超, 刘一鸣, 等. CIMISS中气象观测资料处理入库效率优化方法. 气象科技, 2017, 45(1): 29-34. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201701005.htmJi Y H, Sun C, Liu Y M, et al. A method for optimizing storage efficiency of meteorological data in CIMISS. Meteor Sci Technol, 2017, 45(1): 29-34. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201701005.htm [17] 杨润芝, 沈文海, 肖卫青, 等. 基于MapReduce计算模型的气象资料处理调优试验. 应用气象学报, 2014, 25(5): 618-628. http://qikan.camscma.cn/article/id/20140511Yang R Z, Shen W H, Xiao W Q, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628. http://qikan.camscma.cn/article/id/20140511 [18] 李永生, 曾沁, 徐美红, 等. 基于Hadoop的数值预报产品服务平台设计与实现. 应用气象学报, 2015, 26(1): 122-128. doi: 10.11898/1001-7313.20150113Li Y S, Zeng Q, Xu M H, et al. Design and implementation of NWP data service platform based on Hadoop framework. J Appl Meteor Sci, 2015, 26(1): 122-128. doi: 10.11898/1001-7313.20150113 [19] 肖卫青, 杨润芝, 胡开喜, 等. Hadoop在气象数据密集型处理领域中的应用. 气象科技, 2015, 43(5): 823-828. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201505009.htmXiao W Q, Yang R Z, Hu K X, et al. Application of Hadoop in data-intensive processing of meteorological data. Meteor Sci Technol, 2015, 43(5): 823-828. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201505009.htm [20] Toshniwal A, Taneja S, Shukla A, et al. Storm@twitter//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 2014: 147-156. [21] 陈敏敏, 王新春, 黄奉线. Storm技术内幕与大数据实践. 北京: 人民邮电出版社, 2015.Chen M M, Wang X C, Huang F X. Storm Technology Insider and Big Data Practice. Beijing: Posts & Telecom Press, 2015. [22] 孙小涓, 石涛, 胡玉新, 等. 基于流式计算的空间科学卫星数据实时处理. 计算机应用, 2019, 39(6): 1563-1568. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201906003.htmSun X J, Shi T, Hu Y X, et al. Real-time processing of space science satellite data based on stream computing. J Comput Appl, 2019, 39(6): 1563-1568. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201906003.htm [23] 乔通, 赵卓峰, 丁维龙. 面向套牌甄别的流式计算系统. 计算机应用, 2017, 37(1): 153-158. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201701028.htmQiao T, Zhao Z F, Ding W L. Stream computing system for monitoring copy plate vehicles. J Comput Appl, 2017, 37(1): 153-158. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201701028.htm [24] 孙超, 霍庆, 任芝花, 等. 地面气象资料统计处理系统设计与实现. 应用气象学报, 2018, 29(5): 630-640. doi: 10.11898/1001-7313.20180511Sun C, Huo Q, Ren Z H, et al. Design and implementation of surface meteorological data statistical processing system. J Appl Meteor Sci, 2018, 29(5): 630-640. doi: 10.11898/1001-7313.20180511 [25] 徐达, 曾乐, 王英杰. "天镜" 全流程指标计算功能优化. 计算机技术与发展, 2023, 33(7): 20-26. https://www.cnki.com.cn/Article/CJFDTOTAL-WJFZ202307003.htmXu D, Zeng L, Wang Y J. Optimization of calculation function of "the Mirror" whole process index. Comput Technol Dev, 2023, 33(7): 20-26. https://www.cnki.com.cn/Article/CJFDTOTAL-WJFZ202307003.htm [26] 霍庆, 何文春, 何林等. 气象大数据云平台算法集约化环境设计与应用. 应用气象学报, 2024, 35(1): 80-89. doi: 10.11898/1001-7313.20240107Huo Q, He W C, He L, et al. Design and application of algorithm intensive environment for CMA big data and cloud platform. J Appl Meteor Sci, 2024, 35(1): 80-89. doi: 10.11898/1001-7313.20240107 [27] 王珊, 肖艳芹, 刘大为, 等. 内存数据库关键技术研究. 计算机应用, 2007, 27(10): 2353-2357. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY200710004.htmWang S, Xiao Y Q, Liu D W, et al. Research of main memory database. J Comput Appl, 2007, 27(10): 2353-2357. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY200710004.htm [28] 杨润芝, 马强, 李德泉, 等. 内存转发模型在CIMISS数据收发系统中的应用. 应用气象学报, 2012, 23(3): 377-384. http://qikan.camscma.cn/article/id/20120315Yang R Z, Ma Q, Li D Q, et al. Application of memory forwarding model to data transmission system of CIMISS. J Appl Meteor Sci, 2012, 23(3): 377-384. http://qikan.camscma.cn/article/id/20120315 [29] 戴长秀. 连接池访问数据库的研究与应用. 计算机时代, 2017(11): 20-22. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJS201711007.htmDai C X. Research and application of connection pool access to database. Comput Era, 2017(11): 20-22. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJS201711007.htm [30] Richardson L, Ruby S. RESTfulWebServices. 徐涵, 李红军, 胡伟, 译. 北京: 电子工业出版社, 2008.Richardson L, Ruby S. RESTfulWebServices. Xu H, Li H J, Hu W, Trans. Beijing: Publishing House of Electronics Industry, 2008. [31] Stevens W R. TCP/IP详解卷1: 协议. 范建华, 译. 北京: 机械工业出版社, 2000.Stevens W R. TCP/IP Illustrated. Fan J H, Trans. Beijing: China Machine Press, 2000. [32] 鲁亮, 于炯, 卞琛, 等. 大数据流式计算框架Storm的任务迁移策略. 计算机研究与发展, 2018, 55(1): 71-92. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201801005.htmLu L, Yu J, Bian C, et al. Task migration strategy of Storm, a big data streaming computing framework. J Comput Res Dev, 2018, 55(1): 71-92. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201801005.htm [33] 邢楠, 仲跻芹, 雷蕾等. 基于CMA-BJ的北京地区短时强降水预报试验. 应用气象学报, 2023, 34(6): 641-654. doi: 10.11898/1001-7313.20230601Xing N, Zhong J Q, Lei L, et al. A probabilistic forecast experiment of short-duration heavy rainfall in Beijing based on CMA-BJ. J Appl Meteor Sci, 2023, 34(6): 641-654. doi: 10.11898/1001-7313.20230601 [34] 李莹, 王国复. 气象灾害风险管理系统设计与应用. 应用气象学报, 2022, 33(5): 628-640. doi: 10.11898/1001-7313.20220510Li Y, Wang G F. Design and implementation of Meteorological Disaster Risk Management System. J Appl Meteor Sci, 2022, 33(5): 628-640. doi: 10.11898/1001-7313.20220510 [35] 常煜, 温建伟, 杨雪峰, 等. 基于CMA-TYM和SCMOC的嫩江流域暴雨检验. 应用气象学报, 2023, 34(2): 154-165. doi: 10.11898/1001-7313.20230203Chang Y, Wen J W, Yang X F, et al. Rainstorm inspection in Nenjiang River Basin based on CMA-TYM and SCMOC. J Appl Meteor Sci, 2023, 34(2): 154-165. doi: 10.11898/1001-7313.20230203 [36] 杨和平, 张强, 罗兵, 等. 气象综合指挥平台建设与应用. 应用气象学报, 2023, 34(1): 117-128. doi: 10.11898/1001-7313.20230110Yang H P, Zhang Q, Luo B, et al. Construction and application of Meteorological Integrated Command Platform. J Appl Meteor Sci, 2023, 34(1): 117-128. doi: 10.11898/1001-7313.20230110 [37] 师春香, 潘旸, 谷军霞, 等. 多源气象数据融合格点实况产品研制进展. 气象学报, 2019, 77(4): 774-783. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB201904013.htmShi C X, Pan Y, Gu J X, et al. A review of multi-source meteorological data fusion products. Acta Meteor Sinica, 2019, 77(4): 774-783. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB201904013.htm [38] 郑永光, 张小玲, 周庆亮, 等. 强对流天气短时临近预报业务技术进展与挑战. 气象, 2010, 36(7): 33-42. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201007009.htmZheng Y G, Zhang X L, Zhou Q L, et al. Review on severe convective weather short-term forecasting and nowcasting. Meteor Mon, 2010, 36(7): 33-42. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201007009.htm [39] 张明禄. 国家气象中心牵头改造短临预报业务流程. 中国气象报, 2023-03-29(003).Zhang M L. The National Meteorological Center Takes the Lead in Transforming the Short-term Forecasting and Nowcasting Business Process. China Meteorological News, 2023-03-29(003). [40] 韩丰, 唐文苑, 周楚炫, 等. 基于SWAN系统的降水临近预报算法改进和应用评估. 气象学报, 2023, 81(2): 304-315. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB202302008.htmHan F, Tang W Y, Zhou C X, et al. Improving a precipitation nowcasting algorithm based on the SWAN system and related application assessment. Acta Meteor Sinica, 2023, 81(2): 304-315. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB202302008.htm