气象大数据云平台数据处理系统设计与实现

Design and Implementation of Data Processing Center for CMA Big Data and Cloud Platform

  • 摘要: 针对海量多源异构数据快速接入气象大数据云平台的问题,设计与研发了无缝衔接通信与存储的数据处理系统。系统设计多种可扩展处理流程无缝集成解码算法、数据接入、数据入库等业务逻辑;研发自动气象站、高空L波段探测、雨滴谱、北斗探空、地基垂直观测产品、全球卫星云导风等200多种国内外数据的解码算法;应用流式处理将约7×104个区域气象站小时数据的服务时效由全国综合气象信息共享平台(China Integrated Meteorological Information Sharing System,CIMISS)的5 min提高到2 min;研发配置化驱动的通用处理框架,实现雷达卫星等近千种非结构化数据的配置化接入;采用同一数据多元并行处理技术,满足用户对数据的异构性访问需求。数据处理系统在2021年12月随气象大数据云平台一起在国省业务化运行,实现国内、国外地面、高空、海洋、雷达、卫星、数值模式等1300多种数据实时处理入库,日处理超过700万个文件,近10 TB数据,入库记录超过2亿条,发送超过1亿条数据信息,为气象预报预警、防灾减灾等业务和2022年北京冬奥会、2023年成都大运会及2023年杭州亚运会等重大活动提供了坚实的数据保障。

     

    Abstract: Aiming at solving the problem of massive multi-source heterogeneous data access to CMA Big Data and Cloud Platform (Tianqing), Data Processing Center is designed and developed, combining front-end transmission and back-end storage methods. 8 scalable processing frameworks are developed, which seamlessly integrate decoding algorithms, data access, data insert, and other business logic using technologies such as Storm RabbitMQ and shared file systems. These frameworks solve the common technical problems such as data acquisition, task scheduling, load balance and breakpoint continuation. Over 200 decoding algorithms are developed for domestic and international data, achieving real-time decoding and storage of data such as domestic automatic weather stations, upper-air L-band sounding, wind profile radar, raindrop spectrometers, Beidou sounding systems, ground-based vertical observation systems and global satellite Atmospheric Motion Vector data. By adopting a design that separates decoding algorithms from storage logic, the system has achieved a modular assembly of decoding algorithms and storage interfaces, as well as the hot-swappable development of new data. By applying Storm streaming processing, the service efficiency for hourly data from over 70000 regional stations have improved, reducing the processing time from 5 min for CIMISS to 2 min for Tianqing. The system employs a general processing framework that is driven by configuration, enabling rapid access to unstructured data through flexible configuration methods. The speed of accessing unstructured data has increased by more than 10 times, and over 900 types of unstructured data successfully configured and accessed. The system adopts multiplexing and other technologies to achieve multi-dimensional parallel processing of same data, meeting the heterogeneous access needs of users for the same data and reducing the pressure of data distribution in CMA domestic telecommunication system. For the development of structured data with provincial characteristics, the data processing system provides corresponding software development toolkits, which enable provincial technician to quickly implement business logic and improve development efficiency. Business operations started at both national and provincial levels in December of 2021 and have been running steadily ever since. It addresses the challenge of rapidly processing over 1300 types of data, including ground, high-altitude, oceanic, radar, satellite, and numerical model data, both domestically and internationally, thereby facilitating user access. It processes over 7 million files per day, nearly 10 TB of data and inserts more than 200 million records. It provides basic data support for meteorological forecasting and early warning, disaster prevention and reduction, as well as major events such as Beijing Olympic Winter Games in 2022, Chengdu Universiade in 2023, and Hangzhou Asian Games in 2023.

     

/

返回文章
返回