Abstract:
Meteorological gridded data is typically stored in file formats within distributed file repositories, such as network-attached Storage (NAS). During operations, business systems often need to download files locally, parse them, and subsequently perform analyses and calculations. This traditional approach presents several challenges, including difficulties in data retrieval, prolonged response times, and inability to meet demands for real-time computation and interactive applications. To address these issues, National Meteorological Information Center has developed PostGrid, an integrated database for meteorological gridded data and computing, based on Tianqing Spatial Analysis Library and is specifically designed for distributed environments. The PostGrid database consists of two primary layers: Data layer and operator layer. Data layer is responsible for storing various types of gridded meteorological data. When data is imported into the database, it is stored in a standardized and uniform manner. Each dataset comprises two components: A header file and entity data, both of which are stored in binary format. The header file contains basic descriptive information about the meteorological gridded data, while the entity data store specific layers or fields obtained by partitioning the original gridded dataset. By organizing data according to various dimensions, such as weather elements, forecast start times, spatial layers, levels, and samples-the data layer facilitates the efficient retrieval and analysis of meteorological gridded data. This structured approach significantly enhances the database’s capacity to read and process data, rendering it far more efficient than traditional methods. Operator layer in PostGrid is implemented using SQL functions within the database. These operators facilitate a range of operations on gridded data, including matrix calculations, spatial analysis, statistical aggregation, dimensionality reduction, and data filtering. Furthermore, operators are designed to facilitate distributed parallel computing, enabling faster and more efficient processing of large datasets. By leveraging capabilities of parallel computing, PostGrid can perform complex calculations that would typically require minutes, reducing the processing time to just milliseconds. This capability significantly enhances the performance and flexibility of meteorological data services. Performance tests and real-world applications have demonstrated that PostGrid significantly enhances the efficiency of meteorological data services, reducing the time required for traditional aggregation calculations from minutes to just milliseconds. The database’s capability to integrate both data and computation within a unified platform marks a significant advancement in the management of large-scale meteorological data. It enables faster data retrieval, real-time computation, and supports more advanced interactive applications, making it an invaluable tool for meteorological services and with the potential for widespread application across various sectors within meteorology.