北京全球信息系统中心的数据缓存功能优化

Optimization of Data Cache Function in Beijing Global Information System Center

  • 摘要: 世界气象组织信息系统 (WMO Information System,WIS) 是一个支撑全球气象数据交换共享的通用信息服务平台,北京全球信息系统中心作为WIS的核心功能中心之一,必须缓存最近24 h内的WMO全球交换数据以提供高效的数据访问服务。为了检查收集到的全球交换数据的合法性,需要校验每条数据是否存在与之匹配的元数据,这些元数据信息以关系型数据库方式存储在北京全球信息系统中心中。由于每日接收到的WMO全球交换数据文件个数多且收集时间分布不均,大量频繁的数据库查询操作导致处理性能下降,特别在数据密集收集的情况下容易产生较大延迟,直接影响业务的实时性。设计并实现一种基于内存对象缓存的应用优化现有通过数据库查询的校验方式,实现在内存中一次性载入元数据信息,并在内存中完成缓存数据校验的操作,以此来减少磁盘的读写访问,提升处理效率。此外,还通过多线程方法实现与缓存数据相关的功能,使该方案具有良好的扩展性。实际应用表明:数据缓存功能优化后能满足实时业务性能要求。

     

    Abstract: WMO Information System (WIS) is a coordinated, distributed, global infrastructure for the collection and sharing of information for all WMO and related international programs. As the core center of WIS, each Global Information System Center (GISC) is responsible for the collection and distribution of global exchanging data, and providing data discovery and access service. As a GISC of WIS, a scalable and flexible system is designed and established to satisfy WIS/GISC functionalities in Beijing. Beijing Global Information System Center should hold at least 24-hour WMO global exchanging data files, which could be accessed by authorized users through DAR (data discovery, access and retrieval) services.GISC Beijing has to do validation check for all the global exchanging data files, and only files matching the corresponding metadata could be brought into data cache. The existing approach for the validation is based on database retrieval operation. Currently, there are more than 100000 metadata records stored in the relational database GISC Beijing, while the system receives more than 50000 global exchanging data files in an uneven distribution of collection time. The disadvantage of this way is that frequent database I/O operation would lead to a sharp decline of the system performance, especially when a large number of data entering. Therefore, the approach could not satisfy the requirement of real-time data cache service. Although establishing the table index and multi-threaded mechanism could solve the efficiency of data processing to some extent, it is inevitable that frequent database I/O operation would bring about the performance bottleneck. Therefore, the operation treatment should be optimized.It is a possible way making full use of memory technology to reduce disk I/O cache data and improve the efficiency significantly. Considering the complexity of the mature memory database, a more targeted approach is adopted which is suitable for the scenario of dynamic nature of data with the timeliness requirements. An application is designed and implemented based on the memory object caching technology. When initializing the application, the system loads metadata into memory as a hash table from the database based on stored key/value pairs, organized by the unique bulletin head information of global exchanging data. In this way, the metadata contents are encapsulated as memory objects, thereby providing fast data memory retrieval method. In addition, parallel processing is implemented to extend the functionalities, including data cache logging function and data subscription services.As a result, effects of the optimized function can satisfy the real-time business requirements, reducing the data processing time to an average of less than 5 ms. It also provides an easier way to do extensions by adding memory object using parallel processing.

     

/

返回文章
返回