ARQCS Starting Strategy and Its Relationship with Computing Resource Cost
-
摘要: 利用2012年4月1日—9月30日IBM P570高性能计算环境Oracle 11g数据库平台对全国自动气象站观测资料实时质量控制系统 (ARQCS) 的运行监控数据,探讨了ARQCS的启动策略及其与资料解析入库率、ARQCS的CPU耗时、服务时效之间的关系。结果表明:自动气象站资料的解析入库效率呈“几”字型分布,每个观测时次的第5—20分钟入库率方差较大,是制约ARQCS质量控制时效的主要时间段。设置观测资料入库率不低于95%为首次启动条件,不仅比传统的第15分钟定时启动提前了20.6 s,而且首次启动时观测资料入库率不低于95%的概率从66.38%提升至95.83%。第20分钟后入库率仅增加1.36%,在此设置首次质量控制的强制启动点,可保证局部异常延时的资料服务时效。动态启动策略使ARQCS的启动次数由5次降为2次,平均每日节约CPU时间391 min。Abstract: AWS Observation Data Real-time Quality Control System (ARQCS) is an operational real-time meteorological data application system under IBM P570 high performance computing (HPC) Oracle 11g database platform. Functions including data decoding, database inserting, quality control (QC), storage management and share service are provided for more than 30000 AWS all over China. In 2009, when ARQCS is firstly built, QC methods including boundary value check, internal consistency check, time consistency check and spatial consistency check is applied to only 1 element of hourly precipitation. And the starting strategy is a static one, which start ARQCS at the 15th, 25th, 35th, 45th and 55th minute every hour. Later in 2010, QC methods of other important meteorological elements including air temperature, air pressure, humidity, wind direction and speed get to be applied in ARQCS. Meanwhile, the system computing logic is made more complex after 2 times of updating in 2011 and 2012. Now, it is planned to extend ARQCS to 158 elements in 11 classes totally, which need more calculating resources accordingly. To guarantee QC capability and service timeliness of ARQCS in a high level under limited computing resources, a series of schemes are designed and investigated. System log under IBM P570 HPC Oracle database environment from 1st April to 30th Sep in 2012 is used to analyze ARQCS performance. It is found that the database entry rate (ER) of AWS data exhibits a trapezoid shaped distribution, and variance of ER is large from the 5th to the 20th minute in one hour, which means accumulated ER at the 15th minute is unstable and a low accumulated ER may be got if ARQCS starts at this time. It also indicates that an accumulated ER of 95% is very possible (84.89%) to get before the 20th minute, and accumulated ER is increased by only 1.36% after the 20th minute in average. So a new dynamic starting strategy is employed, that ARQCS starts for the first time when accumulated ER gets more than 95% or until the 20th minute, and starts for the second time at the 55th minute. With this approach, the possibility for accumulated ER over 95% at the 1st QC starting is increased by 29% (from 66.38% to 95.83%). And the average 1st QC starting time is 20.6 seconds before the 15th minute in original static starting strategy. Also, less number of starts from 5 to 2 decrease the CPU time cost from 26.5 minutes to 10.2 minutes per hour, which means saving 391 minutes CPU time per day. It is concluded that the dynamic starting strategy is effective for ARQCS starting adaptively and ensures system robustness.
-
图 2 1 h内入库率分布
(a) 平均单分钟入库率及第1—4分钟、第5—10分钟、第11—17分钟拟合曲线, (b) 平均累计入库率、最低和最高5%的样本平均累计入库率, (c) 单分钟入库率方差
Fig. 2 Distribution of entry rate
(a) averaged entry rate of all samples for every minute and fitted curves from 1st to 4th minute, from 5th to 10th minute and from 11th to 17th minute, (b) averaged accumulated entry rate of all samples, the lowest 5% and the highest 5% of all samples, (c) variance of entry rate for every minute
表 1 不同时间累计入库率St分段出现频次占总样本量的比例 (单位:%)
Table 1 The proportion of accumulated entry rate at different time (unit:%)
时间/
minSt分段比例 [0,
50%)[50%,
80%)[80%,
90%)[90%,
95%)[95%,
100%]15 1.44 4.14 6.85 21.19 66.38 16 1.15 3.42 4.07 16.83 74.52 17 1.01 2.68 2.63 13.79 79.89 18 0.89 2.30 1.84 12.72 82.26 19 0.81 1.72 1.53 12.17 83.76 20 0.69 1.53 1.15 11.73 84.89 25 0.19 0.96 0.55 10.58 87.72 35 0.10 0.12 0.19 9.56 90.04 45 0.05 0.05 0.10 8.88 90.92 55 0.05 0.02 0.00 8.36 91.57 60 0.02 0.05 0.00 8.07 91.86 表 2 ARQCS动态启动策略参数表
Table 2 ARQCS dynamic starting strategy parameters
序号 参数内容 参数设置 1 考核站数 31814站 2 触发启动的入库率临界值 95% 3 初次启动时间 累计入库率不小于触发启动
的入库率临界值4 强制启动时间 第20分钟 5 逐小时自动清算启动时间 第55分钟 -
[1] 钤伟妙, 罗亚丽, 张人禾, 等.引发舟曲特大泥石流灾害强降雨过程成因.应用气象学报, 2011, 22(4):385-397. doi: 10.11898/1001-7313.20110401 [2] 易笑园, 李泽椿, 孙晓磊, 等.渤海西岸暴雨中尺度对流系统的结构及成因.应用气象学报, 2011, 22(1):23-34. doi: 10.11898/1001-7313.20110103 [3] 李瑞萍, 程艳芳, 赵彩萍, 等.加密自动站资料在强对流天气分析预报中的应用.气象科技, 2012, 40(4):596-600. http://www.cnki.com.cn/Article/CJFDTOTAL-NXGJ201402003.htm [4] 闵晶晶, 刘还珠, 曹晓钟, 等.天津"6.25"大冰雹过程的中尺度特征及成因.应用气象学报, 2011, 22(5):525-536. doi: 10.11898/1001-7313.20110502 [5] 何立富, 周庆亮, 陈涛."05.6"华南暴雨中低纬度系统活动及相互作用.应用气象学报, 2010, 21(4):385-394. doi: 10.11898/1001-7313.20100401 [6] 任芝花, 赵平, 张强, 等.适用于全国自动站小时降水资料的质量控制方法.气象, 2010, 36(7):123-132. doi: 10.7519/j.issn.1000-0526.2010.07.019 [7] 任芝花, 许松, 孙化南, 等.全球地面天气报历史资料质量检查与分析.应用气象学报, 2006, 17(4):412-420. doi: 10.11898/1001-7313.20060404 [8] 刘小宁, 鞠晓慧, 范邵华.空间回归检验方法在气象资料质量检验中的应用.应用气象学报, 2006, 17(1):37-42. doi: 10.11898/1001-7313.20060106 [9] 赵煜飞, 任芝花, 张强.适用于全国气象自动站正点相对湿度资料的质量控制方法.气象科学, 2011, 31(6):687-693. http://www.cnki.com.cn/Article/CJFDTOTAL-QXKX201106004.htm [10] 鞠晓慧, 任芝花, 张强.自动站小时气压的质量控制方法研究.安徽农业科学, 2010, 38(27):15130-15133. doi: 10.3969/j.issn.0517-6611.2010.27.116 [11] 林子雨, 杨冬青, 王腾蛟, 等.基于关系数据库的关键词查询.软件学报, 2010, 21(10):2454-2476. http://cdmd.cnki.com.cn/Article/CDMD-10216-2010018589.htm [12] 崔跃生, 张勇, 曾春, 等.数据库物理结构优化技术.软件学报, 2013, 24(4):761-780. http://www.cnki.com.cn/Article/CJFDTOTAL-TXSJ201515173.htm [13] 刘波, 范士明, 刘华.实时数据库混合索引机制的设计与实现.计算机应用, 2011, 31(8):2265-2269. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201108069.htm [14] 文平.Oracle大型数据库系统在AIX/UNIX上的实战详解.北京:电子工业出版社, 2012:593-605. http://www.cnki.com.cn/Article/CJFDTOTAL-SYQY201603027.htm [15] 刘宪军.Oracle RAC 11g实战指南.北京:机械工业出版社, 2011:6-7. http://www.cnki.com.cn/Article/CJFDTOTAL-SYQY201603027.htm [16] 沈宏. Oracle最新技术战略和产品方向及11G新特点. 2011: 36. http://www.oracle.com/technetwork/cn/community/developer-day/5-oracle-db-platform-11g-1432258-zhs.pdf. [17] 杨志邦, 徐成, 周旭, 等.实时系统中弹性调度策略.计算机应用, 2012(2):573-577. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201202074.htm [18] 邹勇, 淮晓永, 李明树.开放式实时系统中的自适应调度方法.计算机学报, 2004, 27(1):58-65. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX200401006.htm [19] 钟樑, 刘晓燕, 张晶, 等.开放式实时系统资源共享环境下的调度方法分析.小型微型计算机系统, 2012, 33(11):2362-2366. doi: 10.3969/j.issn.1000-1220.2012.11.008