Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.
Citation:
Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.
Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.
Citation:
Yang Runzhi, Shen Wenhai, Xiao Weiqing, et al. A set of MapReduce tuning experiments based on meteorological operations. J Appl Meteor Sci, 2014, 25(5): 618-628.
Cloud computing technologies, which solves the problem of low computing power of a standalone server, uses distributed computing technology to achieve the computing power of parallel computing and computational efficiency. Cloud computing is a new application model for decentralized computing which can provide reliable, customized and maximum number of users with minimum resource, and it is also an important way to carry out cloud computing theory research and practical application combining with other theory and good techniques. In many industries and fields, cloud computing has a wider range of applications, and its flexibility, ease of use, stability is gradually affirmed. In meteorological department, cloud-based platform for the development of scientific computing is still very limited, but some attempts are implemented with the maturation of cloud computing.In meteorological operations, such as large-scale scientific computing and other general computing model are run on high-performance server clusters. Due to limitations of resources and the number of HPC nodes, scientific computing still relies on traditional standalone or clustered mode. Therefore, an internal exploration and conventional general-purpose computing and cloud computing platform is very meaningful for the meteorological department. 60-year valuable and precious long sequence of historical data are stored in National Meteorological Information Center for the use of real-time, near-real-time business and research. Processing these historical data is time-consuming, therefore some new methods are implemented. Based on Hadoop cloud computing platform, a cluster mode is built and a variety of statistical methods are adopted using MapReduce computation model. The storage format of the source data is adjusted with SequenceFile which is composed of < Key, Value > serialization, by this mean multiple files of Format-A are merged to a large SequenceFile to test computational efficiency changes. Meanwhile, many small files are merged to a larger file. Configurations are modified experimentally for the Hadoop cluster environment, and different number of task nodes are used to record different computational efficiency.