Lustre并行文件系统的发展及在气象领域的应用前景
The Evolution of Lustre File System and the Perspective of Application to the Meteorology Filed
-
摘要: Lustre是一个在Linux集群环境中广为应用的并行文件系统。该文介绍Lustre相关技术, 并针对气象应用的特点和国家气象信息中心的现状和发展, 设计了一个基于Lustre的多集群全局文件系统整合模型。分析表明:这一模型可以为气象应用提供高效、灵活和统一的在线存储资源, 同时能够较好地满足总体计算环境对“业务连续性”的需求。Abstract: Very high performance computer (HPC) systems are needed to run modern Numerical Weather Prediction (NWP) model. More and more meteorological applications, especially NWP Models have already been and will be run in these large scale cluster systems. As the HPC development has stepped into the mature phase, computing power is never a big problem any more. However, the data processing and data services are becoming a conspicuous issue, since a more powerful HPC would demand and generate much more data. One of the key elements in a HPCs environment to address the issue is the Cluster File System Technology.To improve the HPC's comprehensive utilization and the operational efficiency for meteorological applications, when data processing becomes a restrictive factor, several problems must be solved, such as, how the data can be efficiently moved into or out of a HPC, how a large application can be input or large amount of output data be generated fast; how the data can be exchanged or shared effectively among multiple HPCs? Cluster File System is the answer, and Lustre is one of the best Cluster File System solutions currently in the market. The advantages are as follows. First, Lustre is designed to be a very flexible, scalable and stable file system. In practice, it can be configured with a large variety of machines, as well as different network technologies; the number of nodes can range from several to tens of thousands. Second, Lustre software experiences for over 6 years in many important HPC environments, including the largest lab in the Department of Energy (DOE) in America. It has been widely developed, tested and then put into production for some highest mission-critical applications. During the latest one to two years, it has been recognized by the HPC world and successfully adopted by a majority of Linux based Clusters. One of the core technologies of Lustre is Object Storage which is usually implemented as Object Based Storage Devices (OSD) that aims at achieving both high performance and cross-platform features by offering an entirely new way of abstracting storage-objects. The concept of Object Storage is implemented by Lustre by introducing the Metadata Server (MDS) which is both the hardware and the software component in a Lustre Cluster.A couple of HPCs are currently maintained by National Meteorological Information Center for China Meteorological Administration users. A concept framework is proposed that is designed to establish a globally unified parallel file system by which multiple Linux clusters can be spaned in our environment, hence the operational workflow is optimized and the utilization of the storage resources among the clusters is improved. The global Business Continuity of mulitple HPC clusters can also be greatly improved with the help of this framework.
-
表 1 2006年11月top500前10位系统所采用的集群文件系统
Table 1 Cluster filesystem adopted for the top 10 super-computers in the top500 list in Nov, 2006
-
[1] http://www.top500.org. [2] http://www.pvfs.org. [3] http://www-03.ibm.com/systems/clusters/software/gpfs.html. [4] http://www.sgi.com/products/storage/tech/file-systems.html. [5] http://www.llnl.gov/asci/overview/asci-mission.html. [6] http://www.llnl.gov/asci/platforms/bluegenel. [7] Cluster File Systems Inc. Lustre: A Scalable, High-Performance File System.http://www.lustre.org. [8] Birrell A D, Needham R M.A universal file server.IEEE Transactions on Software Engineering, 1980, SE-6(5):450-453. doi: 10.1109/TSE.1980.230493 [9] http://www.clusterfs.com/images/Docs/lustre-datasheet.pdf. [10] Roy Davis.VAX Cluster Principles. Digital Technical Press, 1993. https://www.amazon.com/VAXcluster-Principles-Alpha-VAX-VMS-Roy-Davis/dp/1555581129 [11] Garth A Gibson, Brent B Welch, David F Nagel, et al.Object Storage:Scalable Bandwidth for HPC Clusters. The Fourth Linux Clusters: The HPC Revolution 2003 Conference and ClusterWorld Conference and Expo, 2003. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.538.6646 [12] Weber R O.Information Technology-SCSI Object Based Storage Device Commands (OSD).T10 Working draft NCITS TBD-200X Project 1355D, 2004. http://citeseerx.ist.psu.edu/showciting?cid=46305 [13] http://www.snia.org/tech-activities/workgroups/osd. [14] http://www.clusterfs.com/roadmap.html. [15] http://www.infinibandta.org.