国家级气象高性能计算机管理与应用网络平台设计
Design and Practice of National Meteorological HPC Management and Application Network Platform
-
摘要: 该文首先介绍了国家气象信息中心在高性能计算能力建设方面的成就及存在的一些问题, 阐述了计算网格的主要思想和技术特点, 并具体介绍了国家级气象高性能计算机管理与应用网络平台的架构和采用的主要技术路线和方法。Abstract: National Meteorological Information Center (NMIC) of CMA operates the fastest high performance computer (HPC) system in China and the total computing capacity of NMIC also ranks best in China and keeps a leading position among meteorological information centers all over the world. With the great-leap-forward development of capability construction, the "soft ability", characterized by system and resource management, user support and quality of service, is left behind. So efforts must be made on development enhancement and construction in this area, bringing into full play the HPC resources in NMIC. Since its birth, grid technology has seen a rapid growth and been an influential direction in information technologies. Grid brings distributed and heterogeneous computer systems together, works cooperatively as a whole, and provides nontrivial quality of service, which enables the management and sharing of HPC resources. Based on the computational grid concept, a national meteorological HPC management and application network platform is put forward. The platform adopts centralized grid architecture, and consists of four levels, namely, user interface, grid management, HPC local management and HPC resources. The platform finds ideal solutions to four key aspects: globally consistent and centralized user management, resource management based on "resource accounts", meta-scheduler and comprehensive operation monitor. Utilizing existing work, with the introduction of mature grid software and open source software, the platform is preliminarily implemented. In the future, research and development efforts will continue in job scheduling policy, quality of service management and data grid, so as to build and perfect the national meteorological HPC management and application network platform and to put it into actual operation finally.
-
Key words:
- grid;
- meta-scheduling;
- resource credits accounting;
- operation monitor
-
表 1 国家气象信息中心的主要计算机系统
表 1 国家气象信息中心的主要计算机系统
-
[1] Foster I, Kesselman C.The Grid: Bluep rint for a New Computing Infrastructure.Morgan-Kaufman, 1999. http://adsabs.harvard.edu/abs/1999gbnc.book.....F [2] Foster I, Kesselman C, Tuecke S.The anatomy of the grid: enabling scalable virtual organizations.International J Supercomputer Applications, 2001, 15(3). http://ieeexplore.ieee.org/document/923162/keywords [3] Czajkowski K, Ferguson D F, Foster I, et al.The WS-Resource Framework.http://www.globus.org/wsrf/specs/wswsrf.pdf, March 5, 2004. [4] MOAB Grid Suite.http://www.clusterresources.com/pages/products/moab-grid-suite.php. [5] GOLD home page. http://www.emsl.pnl.gov/docs/mscf/gold/. [6] SMS home page. http://www.ecmwf.int/products/data/software/sms.html.