气象计算网格平台资源监视模块的设计与实现
Design and Implementation of Resource Monitor Module in Meteorological Computational Grid Platform
-
摘要: 气象计算网格聚合的计算资源具有地理分布、系统异构、运行状况与使用负载各不相同等特点。气象计算网格平台软件系统的资源监视模块, 涉及了远程气象网格节点、资源状态信息获取、web展示等3个层次。资源状态信息获取层可分为轮询、收集、整理等功能, web展示通过资源地理视图和资源列表栏目实现。基于ARCON客户工具箱技术开发实现了资源信息轮询和收集功能。现已实现对国家级网格节点及北京、成都、广州、沈阳4个区域级中心网格节点和安徽省级网格节点的10个高性能计算机系统的集中监视。Abstract: Computing resources, aggregated by meteorological computational Grid, are composed of high performance computers and storage resources.These resources are installed in different areas with different system structures, running conditions and workloads. In order to monitor the status of resources in meteorological computational Grid and to provide users and administrators with reference information, resource monitor module is designed and implemented as part of meteorological computational Grid platform software system.The resource monitor module involves 3 layers : remote meteorological Grid nodes, resource state information acquisition, and web representation.Resource state describes the system information of high performance computers in a meteorological Grid node, comprising 3 major parts, overall information, nodes information and jobs information. The layer of resource state information acquisition is made up of poller, collector, feeder and related configuration files.Correspondingly, the acquisition process of resource state information in the resource monitor module can be divided into 3 parts, polling, collecting and feeding. Web representation layer is on the top and provides users with resource state information through commonly used internet browsers.The resource monitor module is developed based on Grid management software UNICORE and client software ARCON Client, and implemented with Java and XML technology. ARCON Client Toolkit is used to implement node accessing function in the resource monitor module. The poller submits querying jobs for status information to computers in Grid nodes automatically and termly, and pushes it into the log queue when a job is submitted.The collector reads the queue and retrieves results of query. The feeder parses the results and writes a specially formatted XML file. The code of querying and retrieving is asynchronous so as to avoid waiting in querying. As a result, the monitor program runs stably and robustly. Major packages of resource state information acquisition layer are base driver, job scheduling, log queue query, resource state parsing, and configuration setting etc. The web representation reads the XML file containing the resource state query results, and implements resource state displaying via Flex and J2EE technologies.At present, 10 high performance computers have been brought into centralized monitoring in National Meteorological Information Center, Beijing, Chengdu, Guangzhou, Shenyang Regional Centers as well as Anhui Province. Resource monitor module is one of the key parts of meteorological Grid platform software system and providing real time services.In the future, with the further construction of meteorological computational Grid, the resource monitor module will see further application and put major computing resources in meteorological department into supervision.
-
表 1 资源状态信息获取层主要程序包
Table 1 Major packages of resource state information acquisition layer
-
[1] 王彬, 宗翔, 田浩.国家气象网络应用计算系统的设计∥国家气象信息中心2007年度科技年会论文集.2008 : 72-79. [2] 刘桂英, 李祖华, 王彬.CMAGrid中作业调度插件的设计与实现.高性能计算技术, 2009(2): 48-52. [3] 曹燕, 王彬, 李娟.国家气象应用网格平台用户安全的设计和实现∥国家气象信息中心2008年度科技年会论文集.2009 : 61-67. [4] UNICORE Project Homepage.http:∥www. unicore. eu. [5] 王彬, 宗翔.UNICORE技术调研分析报告∥国家气象信息中心2007年度科技年会论文集.2008 : 91-97. [6] ARCON Client Library.http:∥sourceforge. net/project/showfiles. php?group id=102081 & package id=127938. [7] 刘二年, 丰江帆, 张宏.基于Flex的环保WebGIS研究.测绘与空间地理信息, 2006, 29(2): 26-28. http://www.cnki.com.cn/Article/CJFDTOTAL-DBCH200602007.htm [8] 王彬, 宗翔, 魏敏.一个精细粒度实时计算资源管理系统.应用气象学报, 2008, 19(4) : 507-511. http://qk.cams.cma.gov.cn/jams/ch/reader/view_abstract.aspx?file_no=20080416&flag=1 [9] 宗翔, 王彬.国家级气象高性能计算机管理与应用网络平台设计.应用气象学报, 2006, 17(5): 629-634. http://qk.cams.cma.gov.cn/jams/ch/reader/view_abstract.aspx?file_no=200605106&flag=1 [10] 李集明, 沈文海, 王国复.气象信息共享平台及其关键技术研究.应用气象学报, 2006, 17(5): 621-628. http://qk.cams.cma.gov.cn/jams/ch/reader/view_abstract.aspx?file_no=200605105&flag=1 [11] 王彬.国家气象网络计算应用节点门户系统的设计与实现.气象科技, 2006, 34(增刊): 5-9. http://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ2006S1001.htm [12] 常飚.存储检索系统监视信息采集技术分析.气象科技, 2006, 34(增刊): 31-35. http://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ2006S1007.htm [13] 王彬.一个计算作业网格执行环境的分析、设计与应用.计算机应用研究, 2008, 25(8): 2546-2549. http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ200808089.htm [14] 金之雁, 王鼎兴.一种在异构系统中实现负载平衡的方法.应用气象学报, 2003, 14(4): 410-418. http://qk.cams.cma.gov.cn/jams/ch/reader/view_abstract.aspx?file_no=20030451&flag=1 [15] Foster Ⅰ.The Grid :A new infrastructure for 21st century science.Physics Today, 2002, 55(2): 42-47. doi: 10.1063/1.1461327 [16] Foster Ⅰ, Kesselman C, Tuecke S.The anatomy of the Grid : Enabling scalable virtual organizations.International Journal of Supercomputer Applications, 2001, 15(3): 200-222. doi: 10.1177/109434200101500302