大量气象数据小文件自适应优化传输

Adaptive Optimization in Small Size File Transmission of Massive Meteorological Data

  • 摘要: 为满足气象实时资料传输中大量气象数据小文件的高时效传输需求,对其中的数据传输服务进行优化,提出一种基于实时网络状况的自适应数据传输优化方法。该方法采用优化网络传输协议并使用文件压缩技术,通过获取网络传输线路上的实时参数,实时调整压缩参数和网络传输参数以优化传输性能。在自适应压缩时,通过试验分析和归纳,确立了气象数据小文件标准为文件小于50 KB;根据网络实时状况,设计了基于实时网络状况自适应调整压缩等级的算法。在自适应传输参数调优中,研究了TCP缓冲区大小和TCP并发连接数在GridFTP协议中的重要性,针对实时网络状况,分别设计了自适应调整TCP缓冲区大小和TCP并发连接数的算法,算法提升传输性能65%。对以上提出各自适应参数调整算法进行试验验证表明,融合压缩和网络传输的自适应调优方法能显著提升气象小数据文件的传输性能近500倍。

     

    Abstract: The data transfer and service architecture constructed by National Meteorological Information Center is the fundament for most meteorological data transmission. How to improve the timeliness of transmission of various data is a hot topic to enhance capabilities of meteorological services.According to requirements of transmission performance of massive small files, transmission parameters are optimized. And a self-adapting data transmission method is proposed based on real-time network status, which emphasizes network transmission protocol and file compression. Compression parameters and network transmission parameters are adjusted in real-time operation.Meteorological data include a great amount of heterogeneous small files, therefore compressing small files into a big file when being transformed will effectively reduce I/O accesses. First, 50 KB is defined as the threshold for small meteorological data files through experiments. Then, by analyzing the file transfer time, the appropriate file amount in compressed packages is calculated to achieve the best transmission efficiency. Finally, considering the variability of network conditions and real-time network conditions, a self-adapting compression methods based on real-network is designed by means of real-time adjusting the compression level. This entire compression process is controlled by setting various parameters of lzop commands on the basis of the lzop algorithm library and the LZO algorithm. To achieve the goal of adjusting compression levels according to real-time network conditions, RTT (round trip time) is taken advantage of judging the current state of the network congestion. By comparing current RTT and previous RTT, changing the compression level or not is decided.In network transmission optimization, conclusions are made that TCP buffer and parallel transmission will consume memory resources according to experiments in Globus platform. At the same time, more parallel streams and larger size of TCP buffers will result in network congestion. Then, the self-adapting adjustment algorithm of TCP buffer size and the concurrent connection number algorithm of TCP based on real-network parameters are designed. Finally, the entire transmission framework of massive small files is designed by combining self-adapting compression method and transmission parameters optimization. Complete experiments are carried out based on the integration of self-adapting algorithm, showing that proposed optimization methods can improve the transmission performance sharply.

     

/

返回文章
返回