The Design and Implementation of Stream Processing for Data of Ground Automatic Weather Stations
-
Abstract
To process the high-density and high-frequency mass data generated by ground automatic weather stations, a real-time stream processing system based on Storm is designed and implemented in the Meteorological Big Data Cloud Platform (Tianqing). It leverages the advantages of large-scale parallel computing to enhance processing speed. For BUFR messages, a Storm topology is designed to process the standardized BUFR format data transmitted by RabbitMQ directly on the service, reducing the intermediate steps from transmission to processing of observations. In the spout design, the manual confirmation mode of RabbitMQ messages is adopted to ensure that each message is effectively processed. In the decoding process, bolt is anchored to the spout using message identification (ID) to ensure reliable processing of each message. Format and time checks are performed during data decoding to filter out abnormal data. A batch timing monitoring strategy is applied to address the issue of data ingestion loss caused by port occupancy during extensive monitoring data transmission. A startup strategy with a configurable number of spout and bolt is designed for quick optimization and adjustment based on system resources. During cluster deployment, some resources are reserved to enable automatic task migration without disrupting business operations in case of node corruption within the cluster. System design involves automatically reconnecting message queues and databases to enhance system stability and enable self-healing capabilities. Application results show that the service efficiency of 2442 national stations has decreased from 175 s with CIMISS to 78 s with Tianqing. Additionally, the service efficiency of hourly data from over 60000 regional stations has decreased from 5 min with CIMISS to 2 min with Tianqing. After switching the data source of the ART (analysis of real time) system to Tianqing, the number of stations that can be retrieved simultaneously is doubled compared to CIMISS. It can effectively improve the quality of ART live products while keeping other conditions unchanged. By implementing specialized stream processing, it can effectively handle various business scenarios where data access process of the provincial Tianqing ground automatic weather stations differ from that of other provinces. It enables the provincial Tianqing to quickly process nationwide data from ground automatic weather stations. In December 2021, Storm-based stream processing is implemented in the national and provincial meteorological information departments alongside Tianqing. It has been running smoothly over two years, delivering reliable ground automatic weather station data to users, including MICAPS4, SWAN2.0, ART systems and others.
-
-