Abstract:
Aiming at solving the problem of massive multi-source heterogeneous data access to CMA Big Data and Cloud Platform (Tianqing), Data Processing Center is designed and developed, combining front-end transmission and back-end storage methods. 8 scalable processing frameworks are developed, which seamlessly integrate decoding algorithms, data access, data insert, and other business logic using technologies such as Storm RabbitMQ and shared file systems. These frameworks solve the common technical problems such as data acquisition, task scheduling, load balance and breakpoint continuation. Over 200 decoding algorithms are developed for domestic and international data, achieving real-time decoding and storage of data such as domestic automatic weather stations, upper-air L-band sounding, wind profile radar, raindrop spectrometers, Beidou sounding systems, ground-based vertical observation systems and global satellite Atmospheric Motion Vector data. By adopting a design that separates decoding algorithms from storage logic, the system has achieved a modular assembly of decoding algorithms and storage interfaces, as well as the hot-swappable development of new data. By applying Storm streaming processing, the service efficiency for hourly data from over 70000 regional stations have improved, reducing the processing time from 5 min for CIMISS to 2 min for Tianqing. The system employs a general processing framework that is driven by configuration, enabling rapid access to unstructured data through flexible configuration methods. The speed of accessing unstructured data has increased by more than 10 times, and over 900 types of unstructured data successfully configured and accessed. The system adopts multiplexing and other technologies to achieve multi-dimensional parallel processing of same data, meeting the heterogeneous access needs of users for the same data and reducing the pressure of data distribution in CMA domestic telecommunication system. For the development of structured data with provincial characteristics, the data processing system provides corresponding software development toolkits, which enable provincial technician to quickly implement business logic and improve development efficiency. Business operations started at both national and provincial levels in December of 2021 and have been running steadily ever since. It addresses the challenge of rapidly processing over 1300 types of data, including ground, high-altitude, oceanic, radar, satellite, and numerical model data, both domestically and internationally, thereby facilitating user access. It processes over 7 million files per day, nearly 10 TB of data and inserts more than 200 million records. It provides basic data support for meteorological forecasting and early warning, disaster prevention and reduction, as well as major events such as Beijing Olympic Winter Games in 2022, Chengdu Universiade in 2023, and Hangzhou Asian Games in 2023.